Quasi-distinct Parsing and Optimal Compression Methods

Authors:
Amihood Amir;Yonatan Aumann;Avivit Levy;Yuri Roshko
Affiliations:
Department of Computer Science, Bar Ilan University, Ramat Gan, Israel 52900 and Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218;Department of Computer Science, Bar Ilan University, Ramat Gan, Israel 52900;Shenkar College, Ramat Gan, Israel 52526 and CRI, Haifa University, Haifa, Israel 31905;Shenkar College, Ramat Gan, Israel 52526
Venue:
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Year:
2009

Citing 7
Cited 1

Text compression

Text compression
Elements of information theory

Elements of information theory
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Generalized Lempel-Ziv parsing scheme and its preliminary analysis of the average profile

DCC '95 Proceedings of the Conference on Data Compression
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
The Practical Efficiency of Convolutions in Pattern Matching Algorithms

Fundamenta Informaticae - Workshop on Combinatorial Algorithms
Grammar-based codes: a new class of universal lossless source codes

IEEE Transactions on Information Theory

Grammar-based compression in a streaming model

LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the optimality proof of Lempel-Ziv coding is re-studied, and a much more general compression optimality theorem is derived. In particular, the property of quasi-distinct parsing is defined. This property is much weaker than distinct parsing required in the original proof, yet we show that the theorem holds with this weaker property as well. This provides a better understanding of the optimality proof of Lempel-Ziv coding, together with a new tool for proving optimality of other compression schemes. To demonstrate the possible use of this generalization, a new coding method --- the APT coding --- is presented. This new coding method is based on a principle that is very different from Lempel-Ziv's coding. Moreover, it does not directly define any parsing technique. Nevertheless, APT coding is analyzed in this paper and using the generalized theorem shown to be asymptotically optimal up to a constant factor, if APT quasi-distinctness hypothesis holds. An empirical evidence that this hypothesis holds is also given.