Unbounded length contexts for PPM

Authors:
J. G. Cleary;W. J. Teahan;I. H. Witten
Affiliations:
-;-;-
Venue:
DCC '95 Proceedings of the Conference on Data Compression
Year:
1995

Citing 4
Cited 22

Algorithms

Algorithms
Data compression with finite windows

Communications of the ACM
Text compression

Text compression
The Design and Analysis of Efficient Lossless Data Compression Systems

The Design and Analysis of Efficient Lossless Data Compression Systems

Adding some spice to CS1 curricula

SIGCSE '97 Proceedings of the twenty-eighth SIGCSE technical symposium on Computer science education
Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice

IEEE Transactions on Computers
Compact Directed Acyclic Word Graphs for a Sliding Window

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
SAMC - efficient semi-adaptive data compression

CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
PPM Performance with BWT Complexity: A New Method for Lossless Data Compression

DCC '00 Proceedings of the Conference on Data Compression
PPMexe: PPM for Compressing Software

DCC '02 Proceedings of the Data Compression Conference
Compact directed acyclic word graphs for a sliding window

Journal of Discrete Algorithms - SPIRE 2002
Antisequential Suffix Sorting for BWT-Based Data Compression

IEEE Transactions on Computers
Comparative Analysis of XML Compression Technologies

World Wide Web
On the performance of wide-area thin-client computing

ACM Transactions on Computer Systems (TOCS)
On-line construction of compact directed acyclic word graphs

Discrete Applied Mathematics - 12th annual symposium on combinatorial pattern matching (CPM)
Evolutionary lossless compression with GP-ZIP*

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Predicting future locations using prediction-by-partial-match

Proceedings of the first ACM international workshop on Mobile entity localization and tracking in GPS-less environments
Designing for uncertain, asymmetric control: Interaction design for brain-computer interfaces

International Journal of Human-Computer Studies
On-line construction of compact directed acyclic word graphs

Discrete Applied Mathematics
Enhancing prediction accuracy in PCM-based file prefetch by constained pattern replacement algorithm

ICCS'03 Proceedings of the 2003 international conference on Computational science
A note on brain actuated spelling with the Berlin brain-computer interface

UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: ambient interaction
A highly efficient XML compression scheme for the web

SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
Genetic-programming based prediction of data compression saving

EA'09 Proceedings of the 9th international conference on Artificial evolution
Evolution of human-competitive lossless compression algorithms with GP-zip2

Genetic Programming and Evolvable Machines
Hex: dynamics and probabilistic text entry

Switching and Learning in Feedback Systems
PPM compression without escapes

Software—Practice & Experience

Quantified Score

Hi-index	0.01

Visualization

Abstract

The prediction by partial matching (PPM) data compression scheme has set the performance standard in lossless compression of text throughout the past decade. The original algorithm was first published in 1984 by Cleary and Witten, and a series of improvements was described by Moffat (1990), culminating in a careful implementation, called PPMC, which has become the benchmark version. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. PPM, is a finite-context statistical modeling technique that can be viewed as blending together several fixed-order context models to predict the next character in the input sequence. Prediction probabilities for each context in the model are calculated from frequency counts which are updated adaptively; and the symbol that actually occurs is encoded relative to its predicted distribution using arithmetic coding. The paper describes a new algorithm, PPM*, which exploits contexts of unbounded length. It reliably achieves compression superior to PPMC, although our current implementation uses considerably greater computational resources (both time and space). The basic PPM compression scheme is described, showing the use of contexts of unbounded length, and how it can be implemented using a tree data structure. Some results are given that demonstrate an improvement of about 6% over the old method.