Fast Data Compression with Antidictionaries

Authors:
Michael Davidson;Lucian Ilie
Affiliations:
Department of Computer Science, University of Western Ontario, N6A 5B7, London, Ontario, Canada. ilie@csd.uwo.ca;(Correspd.) Department of Computer Science, University of Western Ontario, N6A 5B7, London, Ontario, Canada. ilie@csd.uwo.ca
Venue:
Fundamenta Informaticae - Contagious Creativity - In Honor of the 80th Birthday of Professor Solomon Marcus
Year:
2005

Citing 7
Cited 0

Contextual grammars and natural languages

Handbook of formal languages, vol. 2
Automata for matching patterns

Handbook of formal languages, vol. 2
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Marcus Contextual Grammars

Marcus Contextual Grammars
Text Compression Using Antidictionaries

ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Pattern Matching in Text Compressed by Using Antidictionaries

CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
On the Complexity of Finite Sequences

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the data compression using antidictionaries and give algorithms for faster compression and decompression. While the original method of Crochemore et al. uses finite transducers with ε-moves, we (de)compress using ε-free transducers. This is provably faster, assuming data non-negligibly compressible, but we have to consider the overhead due to building the new ma-chines. In general, they can be quadratic in size compared to the ones allowing ε-moves; we prove this bound optimal as it is reached for de Bruijn words. However, in practice, the size of the ε-free machines turns out to be close to the size of the ones allowing ε-moves and therefore we can achieve significantly faster (de)compression. We show our results for the files in Calgary corpus.