Indexing text using the Ziv-Lempel trie

Authors:
Gonzalo Navarro
Affiliations:
Department of Computer Science, Univ. of Chile, Blanco Encalada 2120, Santiago, Chile
Venue:
Journal of Discrete Algorithms - SPIRE 2002
Year:
2004

Citing 18
Cited 46

Functional approach to data structures and its use in multidimensional searching

SIAM Journal on Computing
Text compression

Text compression
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

SIAM Journal on Computing
An experimental study of an opportunistic index

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Space efficient suffix trees

Journal of Algorithms
Succinct representations of lcp information and improvements in the compressed suffix arrays

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Adding Compression to Block Addressing Inverted Indexes

Information Retrieval
Optimal Exact Strring Matching Based on Suffix Arrays

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Indexing Text Using the Ziv-Lempel Trie

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Succinct representation of balanced parentheses, static trees and planar graphs

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On compressing and indexing data

On compressing and indexing data
Compact suffix array: a space-efficient full-text index

Fundamenta Informaticae - Special issue on computing patterns in strings

Indexing compressed text

Journal of the ACM (JACM)
Suffix arrays: what are they good for?

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Succinct suffix arrays based on run-length encoding

Nordic Journal of Computing
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Rank and select revisited and extended

Theoretical Computer Science
A compressed self-index using a Ziv---Lempel dictionary

Information Retrieval
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
Implementing the LZ-index: Theory versus practice

Journal of Experimental Algorithmics (JEA)
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
Self-indexing Natural Language

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Indexed Hierarchical Approximate String Matching

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Dependability Improvement for PPM Compressed Data by Using Compression Pattern Matching

IEICE - Transactions on Information and Systems
Engineering a compressed suffix tree implementation

Journal of Experimental Algorithmics (JEA)
Engineering a compressed suffix tree implementation

WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Approximate string matching with Lempel-Ziv compressed indexes

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Compressed dynamic tries with applications to LZ-compression in sublinear time and space

FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science
A faster algorithm for the computation of string convolutions using LZ78 parsing

Information Processing Letters
New methods for compression of MP double array by compact management of suffixes

Information Processing and Management: an International Journal
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices

Journal of Experimental Algorithmics (JEA)
Space-efficient construction of Lempel-Ziv compressed text indexes

Information and Computation
Worst case efficient single and multiple string matching in the RAM model

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Fully compressed suffix trees

ACM Transactions on Algorithms (TALG)
Self-indexing based on LZ77

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
ESP-index: a compressed index based on edit-sensitive parsing

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Statistical encoding of succinct data structures

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Dynamic entropy-compressed sequences and full-text indexes

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Reducing the space requirement of LZ-Index

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
A compressed self-index using a ziv-lempel dictionary

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Compact rich-functional binary relation representations

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Space-efficient construction of LZ-index

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Succinct suffix arrays based on run-length encoding

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Advantages of backward searching — efficient secondary memory and distributed implementation of compressed suffix arrays

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
New algorithms on wavelet trees and applications to information retrieval

Theoretical Computer Science
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model

Journal of Discrete Algorithms
Self-Indexed Grammar-Based Compression

Fundamenta Informaticae
Wavelet trees for all

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
A Lempel-Ziv text index on secondary storage

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Compressed text indexes with fast locate

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
The wavelet matrix

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
ESP-index: A compressed index based on edit-sensitive parsing

Journal of Discrete Algorithms
Optimized relative Lempel-Ziv compression of genomes

ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
On compressing and indexing repetitive sequences

Theoretical Computer Science
Wavelet trees for all

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let a text of u characters over an alphabet of size σ be compressible to n phrases by the LZ78 algorithm. We show how to build a data structure based on the Ziv-Lempel trie, called the LZ-index, that takes 4n log2 n (1 + o(1)) bits of space (that is, 4 times the entropy of the text for ergodic sources) and reports the R occurrences of a pattern of length m in worst case time O (m3 log σ + (m + R) log n). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found.