FEMTO: fast search of large sequence collections

  • Authors:
  • Michael P. Ferguson

  • Affiliations:
  • Laboratory for Telecommunications Sciences, College Park, Maryland

  • Venue:
  • CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present FEMTO, a new system for indexing and searching large collections of sequence data. We used FEMTO to index and search three large collections, including one 182 GB collection. We compare the performance of FEMTO indexing and search with Bowtie and with Lucene, and we compare performance with indexes stored on hard disks and in flash memory. To our knowledge, we report on the first compressed suffix array storing more than 100 GB. Even for the largest collection, most searches completed in under 10 seconds.