LZ77-Like Compression with Fast Random Access

Authors:
Sebastian Kreft;Gonzalo Navarro
Affiliations:
-;-
Venue:
DCC '10 Proceedings of the 2010 Data Compression Conference
Year:
2010

Citing 0
Cited 11

Lempel-Ziv factorization revisited

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Self-indexing based on LZ77

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Indexes for highly repetitive document collections

Proceedings of the 20th ACM international conference on Information and knowledge management
Iterative Dictionary Construction for Compression of Large DNA Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections

Proceedings of the VLDB Endowment
Grammar-based compression in a streaming model

LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Faster approximate pattern matching in compressed repetitive texts

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Fast relative lempel-ziv self-index for similar sequences

FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
Self-Indexed Grammar-Based Compression

Fundamenta Informaticae
On compressing and indexing repetitive sequences

Theoretical Computer Science
FRESCO: Referential Compression of Highly Similar Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce an alternative Lempel-Ziv text parsing, LZ-End, that converges to the entropy and in practice gets very close to LZ77. LZ-End forces sources to finish at the end of a previous phrase. Most Lempel-Ziv parsings can decompress the text only from the beginning. LZ-End is the only parsing we know of able of decompressing arbitrary phrases in optimal time, while staying closely competitive with LZ77, especially on highly repetitive collections, where LZ77 excells. Thus LZ-End is ideal as a compression format for highly repetitive sequence databases, where access to individual sequences is required, and it also opens the door to compressed indexing schemes for such collections.