The input/output complexity of sorting and related problems
Communications of the ACM
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing
DCC '08 Proceedings of the Data Compression Conference
Compressed Index for Dictionary Matching
DCC '08 Proceedings of the Data Compression Conference
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Efficient Data Structures for the Orthogonal Range Successor Problem
COCOON '09 Proceedings of the 15th Annual International Conference on Computing and Combinatorics
A Lempel-Ziv text index on secondary storage
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
ACM Transactions on Algorithms (TALG)
Compressed text indexing with wildcards
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A faster grammar-based self-index
LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Computing lempel-ziv factorization online
MFCS'12 Proceedings of the 37th international conference on Mathematical Foundations of Computer Science
Compressed text indexing with wildcards
Journal of Discrete Algorithms
Hi-index | 0.00 |
A new trend in the field of pattern matching is to design indexing data structures which take space very close to that required by the indexed text (in entropy-compressed form) and also simultaneously achieve good query performance. Two popular indexes, namely the FM-index [Ferragina and Manzini, 2005] and the CSA [Grossi and Vitter 2005], achieve this goal by exploiting the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994]. However, due to the intricate permutation structure of BWT, no locality of reference can be guaranteed when we perform pattern matching with these indexes. Chien et al. [2008] gave an alternative text index which is based on sparsifying the traditional suffix tree and maintaining an auxiliary 2-D range query structure. Given a text T of length n drawn from a *** -sized alphabet set, they achieved O (n log*** )-bit index for T and showed that this index can preserve locality in pattern matching and hence is amenable to be used in external-memory settings. We improve upon this index and show how to apply entropy compression to reduce index space. Our index takes O (n (H k + 1)) + o (n log*** ) bits of space where H k is the k th-order empirical entropy of the text. This is achieved by creating variable length blocks of text using arithmetic coding.