Implementing the Context Tree Weighting Method for Text Compression

Authors:
Kunihiko Sadakane;Takumi Okazaki;Hiroshi Imai
Affiliations:
-;-;-
Venue:
DCC '00 Proceedings of the Conference on Data Compression
Year:
2000

Citing 9
Cited 5

A new challenge for compression algorithms: genetic sequences

Information Processing and Management: an International Journal - Special issue: data compression
Arithmetic coding for data compression

Communications of the ACM
The entropy of English using PPM-based models

DCC '96 Proceedings of the Conference on Data Compression
Context-Tree Weighting Method for Text Generating Sources

DCC '97 Proceedings of the Conference on Data Compression
Text Compression by Context Tree Weighting

DCC '97 Proceedings of the Conference on Data Compression
Multialphabet Coding with Separate Alphabet Description

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
The Design and Analysis of Efficient Lossless Data Compression Systems

The Design and Analysis of Efficient Lossless Data Compression Systems
Switching Between Two Universal Source Coding Algorithms

DCC '98 Proceedings of the Conference on Data Compression
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Data Compression Method Combining Properties of PPM and CTW

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Prototyping of Efficient Hardware Algorithms for Data Compression in Future Communication Systems

RSP '01 Proceedings of the 12th International Workshop on Rapid System Prototyping
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

The Journal of Machine Learning Research
On prediction using variable order Markov models

Journal of Artificial Intelligence Research
Non-repetitive DNA sequence compression using memoization

ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context tree weighting method is a universal compression algorithm for FSMX sources. Though we expect that it will have good compression ratio in practice, it is difficult to implement it and in many cases the implementation is only for estimating compression ratio. Though Willems and Tjalkens showed practical implementation using not block probabilities but conditional probabilities, it is used for only binary alphabet sequences. We extend the method for multi-alphabet sequences and show a simple implementation using PPM techniques. We also propose a method to optimize a parameter of the context tree weighting for binary alphabet case. Experimental results on texts and DNA sequences show that the performance of PPM can be improved by combining the context tree weighting and that DNA sequences can be compressed in less than 2.0 bpc.