Toward optimal disk layout of genome scale suffix trees

  • Authors:
  • Vikas K. Garg

  • Affiliations:
  • IBM Research - India

  • Venue:
  • SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Suffix trees provide for efficient indexing of numerous sequence processing problems in biological databases. We address the pivotal issue of improving the search efficiency of disk-resident suffix trees by improving the storage layout from a statistical learning viewpoint. In particular, we make the following contributions: we (a) introduce the Q-Optimal Disk Layout(Q-OptDL) problem in the context of suffix trees and prove it to be NP-Hard, and (b) propose an algorithm for improving the layout of suffix trees that is guaranteed to perform asymptotically no worse than twice the optimal disk layout.