A compressed self-index using a Ziv---Lempel dictionary

Authors:
Luís M. Russo;Arlindo L. Oliveira
Affiliations:
INESC-ID/IST, Lisboa, Portugal 1049-001;INESC-ID/IST, Lisboa, Portugal 1049-001
Venue:
Information Retrieval
Year:
2008

Citing 24
Cited 8

Functional approach to data structures and its use in multidimensional searching

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

SIAM Journal on Computing
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Introduction to Algorithms

Introduction to Algorithms
Succinct Representation of Balanced Parentheses and Static Trees

SIAM Journal on Computing
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Succinct ordinal trees with level-ancestor queries

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays

Journal of Algorithms
Indexing compressed text

Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Representing Trees of Higher Degree

Algorithmica
Squeezing succinct data structures into entropy bounds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Succinct representations of permutations

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Reducing the space requirement of LZ-Index

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
A compressed self-index using a ziv-lempel dictionary

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices

Journal of Experimental Algorithmics (JEA)
Space-efficient construction of Lempel-Ziv compressed text indexes

Information and Computation
Fully compressed suffix trees

ACM Transactions on Algorithms (TALG)
Self-indexing based on LZ77

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Self-Indexed Grammar-Based Compression

Fundamenta Informaticae
Improved grammar-based compressed indexes

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
ESP-index: A compressed index based on edit-sensitive parsing

Journal of Discrete Algorithms
On compressing and indexing repetitive sequences

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A compressed full-text self-index for a text T, of size u, is a data structure used to search for patterns P, of size m, in T, that requires reduced space, i.e. space that depends on the empirical entropy (H k or H 0) of T, and is, furthermore, able to reproduce any substring of T. In this paper we present a new compressed self-index able to locate the occurrences of P in O((m + occ)log u) time, where occ is the number of occurrences. The fundamental improvement over previous LZ78 based indexes is the reduction of the search time dependency on m from O(m 2) to O(m). To achieve this result we point out the main obstacle to linear time algorithms based on LZ78 data compression and expose and explore the nature of a recurrent structure in LZ-indexes, the $${\mathcal{T}}_{78}$$ suffix tree. We show that our method is very competitive in practice by comparing it against other state of the art compressed indexes.