Space-efficient construction of LZ-index

Authors:
Diego Arroyuelo;Gonzalo Navarro
Affiliations:
Dept. of Computer Science, University of Chile, Santiago, Chile;Dept. of Computer Science, University of Chile, Santiago, Chile
Venue:
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Year:
2005

Citing 20
Cited 12

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient suffix trees on secondary storage

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

SIAM Journal on Computing
An experimental study of an opportunistic index

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Static Dictionaries Supporting Rank

ISAAC '99 Proceedings of the 10th International Symposium on Algorithms and Computation
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Optimal Exact Strring Matching Based on Suffix Arrays

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Indexing Text Using the Ziv-Lempel Trie

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Succinct representation of balanced parentheses, static trees and planar graphs

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On compressing and indexing data

On compressing and indexing data
Compact suffix array: a space-efficient full-text index

Fundamenta Informaticae - Special issue on computing patterns in strings
When indexing equals compression: experiments with compressing suffix arrays and applications

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
Succinct suffix arrays based on run-length encoding

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Compressed full-text indexes

ACM Computing Surveys (CSUR)
Dynamic entropy-compressed sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Implementing the LZ-index: Theory versus practice

Journal of Experimental Algorithmics (JEA)
An Improved Succinct Representation for Dynamic k-ary Trees

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
On-line construction of compact suffix vectors and maximal repeats

Theoretical Computer Science
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
Implicit compression boosting with applications to self-indexing

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Compressed dynamic tries with applications to LZ-compression in sublinear time and space

FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices

Journal of Experimental Algorithmics (JEA)
Space-efficient construction of Lempel-Ziv compressed text indexes

Information and Computation
Dynamic entropy-compressed sequences and full-text indexes

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
A Lempel-Ziv text index on secondary storage

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. The LZ-index, in particular, requires 4uHk(1+o(1)) bits of space, where u is the text length in characters and Hk is its k-th order empirical entropy. Although in practice the LZ-index needs 1.0-1.5 times the text size, its construction requires much more main memory (around 5 times the text size), which limits its applicability to large texts. In this paper we present a practical space-efficient algorithm to construct LZ-index, requiring (4+ε)uHk+o(u) bits of space, for any constant 0εO(σu) time, being σ the alphabet size. Our experimental results show that our method is efficient in practice, needing an amount of memory close to that of the final index.