Improving suffix array locality for fast pattern matching on disk

  • Authors:
  • Ranjan Sinha;Simon Puglisi;Alistair Moffat;Andrew Turpin

  • Affiliations:
  • The University of Melbourne, Melbourne, Australia;RMIT University, Melbourne, Australia;The University of Melbourne, Melbourne, Australia;RMIT University, Melbourne, Australia

  • Venue:
  • Proceedings of the 2008 ACM SIGMOD international conference on Management of data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many problems involving pattern matching and pattern discovery in large strings, such as those arising in computational biology. Here we address the problem of arranging a suffix array on disk so that querying is fast in practice. We show that the combination of a small trie and a suffix array-like blocked data structure allows queries to be answered as much as three times faster than the best alternative disk-based suffix array arrangement. Construction of our data structure requires only modest processing time on top of that required to build the suffix tree, and requires negligible extra memory.