A new method for indexing genomes using on-disk suffix trees

  • Authors:
  • Marina Barsky;Ulrike Stege;Alex Thomo;Chris Upton

  • Affiliations:
  • University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before.