Lempel-Ziv factorization revisited
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Indexes for highly repetitive document collections
Proceedings of the 20th ACM international conference on Information and knowledge management
Iterative Dictionary Construction for Compression of Large DNA Data Sets
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections
Proceedings of the VLDB Endowment
Grammar-based compression in a streaming model
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Faster approximate pattern matching in compressed repetitive texts
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Fast relative lempel-ziv self-index for similar sequences
FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
Self-Indexed Grammar-Based Compression
Fundamenta Informaticae
On compressing and indexing repetitive sequences
Theoretical Computer Science
FRESCO: Referential Compression of Highly Similar Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We introduce an alternative Lempel-Ziv text parsing, LZ-End, that converges to the entropy and in practice gets very close to LZ77. LZ-End forces sources to finish at the end of a previous phrase. Most Lempel-Ziv parsings can decompress the text only from the beginning. LZ-End is the only parsing we know of able of decompressing arbitrary phrases in optimal time, while staying closely competitive with LZ77, especially on highly repetitive collections, where LZ77 excells. Thus LZ-End is ideal as a compression format for highly repetitive sequence databases, where access to individual sequences is required, and it also opens the door to compressed indexing schemes for such collections.