Improving suffix array locality for fast pattern matching on disk

Authors:
Ranjan Sinha;Simon Puglisi;Alistair Moffat;Andrew Turpin
Affiliations:
The University of Melbourne, Melbourne, Australia;RMIT University, Melbourne, Australia;The University of Melbourne, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 17
Cited 5

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Hierarchies of indices for text searching

Information Systems
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Reducing the space requirement of suffix trees

Software—Practice & Experience
Optimal Exact Strring Matching Based on Suffix Arrays

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Engineering a Lightweight Suffix Array Construction Algorithm

Algorithmica
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory

IEEE Transactions on Knowledge and Data Engineering
Improved Gapped Alignment in BLAST

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Practical methods for constructing suffix trees

The VLDB Journal — The International Journal on Very Large Data Bases
Compressed full-text indexes

ACM Computing Surveys (CSUR)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Genome-scale disk-based suffix tree indexing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Better external memory suffix array construction

Journal of Experimental Algorithmics (JEA)

Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
A new method for indexing genomes using on-disk suffix trees

Proceedings of the 17th ACM conference on Information and knowledge management
Reducing Space Requirements for Disk Resident Suffix Arrays

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Permuted Longest-Common-Prefix Array

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Word-based self-indexes for natural language text

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many problems involving pattern matching and pattern discovery in large strings, such as those arising in computational biology. Here we address the problem of arranging a suffix array on disk so that querying is fast in practice. We show that the combination of a small trie and a suffix array-like blocked data structure allows queries to be answered as much as three times faster than the best alternative disk-based suffix array arrangement. Construction of our data structure requires only modest processing time on top of that required to build the suffix tree, and requires negligible extra memory.