Functional approach to data structures and its use in multidimensional searching
SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
SIAM Journal on Computing
Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Reducing the space requirement of suffix trees
Software—Practice & Experience
Compression of Low Entropy Strings with Lempel--Ziv Algorithms
SIAM Journal on Computing
Journal of Algorithms
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Introduction to Algorithms
Succinct Representation of Balanced Parentheses and Static Trees
SIAM Journal on Computing
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Resizable Arrays in Optimal Time and Space
WADS '99 Proceedings of the 6th International Workshop on Algorithms and Data Structures
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Compact suffix array: a space-efficient full-text index
Fundamenta Informaticae - Special issue on computing patterns in strings
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Representing Trees of Higher Degree
Algorithmica
Squeezing succinct data structures into entropy bounds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
ACM Computing Surveys (CSUR)
A simple optimal representation for balanced parentheses
Theoretical Computer Science
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Compressed indexes for dynamic text collections
ACM Transactions on Algorithms (TALG)
Ultra-succinct representation of ordered trees
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Theoretical Computer Science
Fast BWT in small space by blockwise suffix sorting
Theoretical Computer Science
Theoretical Computer Science
Rank and select revisited and extended
Theoretical Computer Science
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A compressed self-index using a Ziv---Lempel dictionary
Information Retrieval
Better external memory suffix array construction
Journal of Experimental Algorithmics (JEA)
Implementing the LZ-index: Theory versus practice
Journal of Experimental Algorithmics (JEA)
An Improved Succinct Representation for Dynamic k-ary Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Rank/select on dynamic compressed sequences and applications
Theoretical Computer Science
Compressed Suffix Arrays for Massive Data
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
SIAM Journal on Computing
Succinct representations of permutations
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Succinct dynamic dictionaries and trees
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Compressed dynamic tries with applications to LZ-compression in sublinear time and space
FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science
Fully-functional succinct trees
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices
Journal of Experimental Algorithmics (JEA)
Succinct representations of dynamic strings
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Lightweight data indexing and compression in external memory
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Space-efficient construction of LZ-index
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Stronger Lempel-Ziv Based Compressed Text Indexing
Algorithmica
Efficient implementation of rank and select functions for succinct representation
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Space efficient wavelet tree construction
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Hi-index | 0.00 |
A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. This is very important nowadays, since one can accommodate the index of very large texts entirely in main memory, avoiding the slower access to secondary storage. In particular, the LZ-index [G. Navarro, Indexing text using the Ziv-Lempel trie, Journal of Discrete Algorithms (JDA) 2 (1) (2004) 87-114] stands out for its good performance at extracting text passages and locating pattern occurrences. Given a text T[1..u] over an alphabet of size @s, the LZ-index requires 4|LZ|(1+o(1)) bits of space, where |LZ| is the size of the LZ78-compression of T. This can be bounded by |LZ|=uH"k(T)+o(ulog@s), where H"k(T) is the k-th order empirical entropy of T, for any k=o(log"@su). The LZ-index is built in O(ulog@s) time, yet requiring O(ulogu) bits of main memory in the worst case. In practice, the LZ-index occupies 1.0-1.5 times the text size (and replaces the text), but its construction requires around 5 times the text size. This limits its applicability to medium-sized texts. In this paper we present a space-efficient algorithm to construct the LZ-index in O(u(log@s+loglogu)) time and requiring 4|LZ|(1+o(1)) bits of main memory, that is, asymptotically the same space of the final index. We also adapt our algorithm to construct more recent reduced versions of the LZ-index, which occupy from 1 to 3 times |LZ|(1+o(1)) bits, and show that these can also be built using asymptotically the same space of the final index. Finally, we study an alternative model in which we are given only a limited amount of main memory to carry out the indexing process (less than that required by the final index), and must use the disk for the rest. We show how to build all the LZ-index variants in O(u(log@s+loglogu)) time, and within |LZ|(1+o(1)) bits of main memory, that is, asymptotically just the space to hold the LZ78-compressed text. Our experimental results show that our method is efficient in practice, needing an amount of memory close to that of the final index, and being competitive with the best construction times of other compressed indexes.