Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Hierarchies of indices for text searching
Information Systems
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Fast string searching in secondary storage: theoretical developments and experimental results
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Reducing the space requirement of suffix trees
Software—Practice & Experience
Compression of Low Entropy Strings with Lempel--Ziv Algorithms
SIAM Journal on Computing
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Succinct representations of lcp information and improvements in the compressed suffix arrays
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct Representation of Balanced Parentheses and Static Trees
SIAM Journal on Computing
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Compressed Text Indexes with Fast Locate
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Reducing the space requirement of LZ-Index
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Space-efficient construction of LZ-index
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Implementing the LZ-index: Theory versus practice
Journal of Experimental Algorithmics (JEA)
An Improved Succinct Representation for Dynamic k-ary Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
On Entropy-Compressed Text Indexing in External Memory
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices
Journal of Experimental Algorithmics (JEA)
Hi-index | 0.00 |
Full-text searching consists in locating the occurrences of a given pattern P[1..m] in a text T[1..u], both sequences over an alphabet of size σ. In this paper we define a new index for full-text searching on secondary storage, based on the Lempel-Ziv compression algorithm and requiring 8uHk +o(u log σ) bits of space, where Hk denotes the k-th order empirical entropy of T, for any k = o(logσ u). Our experimental results show that our index is significantly smaller than any other practical secondary-memory data structure: 1.4-2.3 times the text size including the text, which means 39%-65% the size of traditional indexes like String B-trees [Ferragina and Grossi, JACM 1999]. In exchange, our index requires more disk access to locate the pattern occurrences. Our index is able to report up to 600 occurrences per disk access, for a disk page of 32 kilobytes. If we only need to count pattern occurrences, the space can be reduced to about 1.04-1.68 times the text size, requiring about 20-60 disk accesses, depending on the pattern length.