Reducing the space requirement of LZ-Index

Authors:
Diego Arroyuelo;Gonzalo Navarro;Kunihiko Sadakane
Affiliations:
Dept. of Computer Science, Universidad de Chile;Dept. of Computer Science, Universidad de Chile;Dept. of Computer Science and Communication Engineering, Kyushu University, Japan
Venue:
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Year:
2006

Citing 16
Cited 15

Functional approach to data structures and its use in multidimensional searching

SIAM Journal on Computing
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

SIAM Journal on Computing
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct Representation of Balanced Parentheses and Static Trees

SIAM Journal on Computing
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Succinct ordinal trees with level-ancestor queries

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays

Journal of Algorithms
Indexing compressed text

Journal of the ACM (JACM)
Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Representing Trees of Higher Degree

Algorithmica
Squeezing succinct data structures into entropy bounds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Succinct representations of permutations

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming

A compressed self-index using a Ziv---Lempel dictionary

Information Retrieval
Implementing the LZ-index: Theory versus practice

Journal of Experimental Algorithmics (JEA)
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
Self-indexing Natural Language

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Approximate string matching with Lempel-Ziv compressed indexes

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices

Journal of Experimental Algorithmics (JEA)
A compressed self-index using a ziv-lempel dictionary

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Self-Indexed Grammar-Based Compression

Fundamenta Informaticae
A Lempel-Ziv text index on secondary storage

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Improved grammar-based compressed indexes

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
ESP-index: A compressed index based on edit-sensitive parsing

Journal of Discrete Algorithms
On compressing and indexing repetitive sequences

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The LZ-index is a compressed full-text self-index able to represent a text P1...m, over an alphabet of size $\sigma = O(\textrm{polylog}(u))$ and with k-th order empirical entropy Hk(T), using 4uHk(T) + o(ulogσ) bits for any k = o(logσu). It can report all the occ occurrences of a pattern P1...m in T in O(m3logσ + (m + occ)logu) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + ε)uHk(T) + o(ulogσ) bits of space, for any constant ε 0, and we simultaneously improve the search time to O(m2logm + (m + occ)logu). Both indexes support displaying any subtext of length ℓ in optimal O(ℓ/logσu) time. In addition, we show how the space can be squeezed to (1 + ε)uHk(T) + o(ulogσ) to obtain a structure with O(m2) average search time for $m \geqslant 2\log_\sigma{u}$.