Reducing Space Requirements for Disk Resident Suffix Arrays

Authors:
Alistair Moffat;Simon J. Puglisi;Ranjan Sinha
Affiliations:
Department of Computer Science and Software Engineering, The University of Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, Melbourne, Australia;Department of Computer Science and Software Engineering, The University of Melbourne, Australia
Venue:
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Year:
2009

Citing 18
Cited 0

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Hierarchies of indices for text searching

Information Systems
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
On the sorting-complexity of suffix tree construction

Journal of the ACM (JACM)
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory

IEEE Transactions on Knowledge and Data Engineering
Practical methods for constructing suffix trees

The VLDB Journal — The International Journal on Very Large Data Bases
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Genome-scale disk-based suffix tree indexing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Improving suffix array locality for fast pattern matching on disk

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Better external memory suffix array construction

Journal of Experimental Algorithmics (JEA)
Compressed Text Indexes with Fast Locate

CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Enhanced byte codes with restricted prefix properties

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Suffix trees and suffix arrays are important data structures for string processing, providing efficient solutions for many applications involving pattern matching. Recent work by Sinha et al. (SIGMOD 2008) addressed the problem of arranging a suffix array on disk so that querying is fast, and showed that the combination of a small trie and a suffix array-like blocked data structure allows queries to be answered many times faster than alternative disk-based suffix trees. A drawback of their LOF-SA structure, and common to all current disk resident suffix tree/array approaches, is that the space requirement of the data structure, though on disk, is large relative to the text --- for the LOF-SA, 13n bytes including the underlying n byte text. In this paper we explore techniques for reducing the space required by the LOF-SA. Experiments show these methods cut the data structure to nearly half its original size, without, for large strings that necessitate on-disk structures, any impact on search times.