PPM with the extended alphabet

Authors:
Przemysław Skibiński
Affiliations:
Institute of Computer Science, University of Wrocław, Przesmyckiego 20, 51-151 Wrocław, Poland
Venue:
Information Sciences: an International Journal
Year:
2006

Citing 15
Cited 3

Word-based text compression

Software—Practice & Experience
Modeling for text compression

ACM Computing Surveys (CSUR)
Text compression

Text compression
Data compression with long repeated strings

Information Sciences: an International Journal - Dictionary based compression
An approach to phrase selection for offline data compression

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
LZP: A New Data Compression Algorithm

DCC '96 Proceedings of the Conference on Data Compression
A Corpus for the Evaluation of Lossless Compression Algorithms

DCC '97 Proceedings of the Conference on Data Compression
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
Compression of Biological Sequences by Greedy Off-Line Textual Substitution

DCC '00 Proceedings of the Conference on Data Compression
The Design and Analysis of Efficient Lossless Data Compression Systems

The Design and Analysis of Efficient Lossless Data Compression Systems
Tag Based Models of English Text

DCC '98 Proceedings of the Conference on Data Compression
Switching Between Two Universal Source Coding Algorithms

DCC '98 Proceedings of the Conference on Data Compression
PPM: One Step to Practicality

DCC '02 Proceedings of the Data Compression Conference
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

Computing the λ-covers of a string

Information Sciences: an International Journal
Natural Language Compression on Edge-Guided text preprocessing

Information Sciences: an International Journal
Fast decoding algorithms for variable-lengths codes

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

In the following paper we propose modification of Prediction by Partial Matching (PPM)-a lossless data compression algorithm, which extends an alphabet, used in the PPM method, to long repeated strings. Usually the PPM algorithm's alphabet consists of 256 characters only. We show, on the basis of the Calgary corpus [T.C. Bell, J. Cleary, I.H. Witten, Text compression. Advanced Reference Series, Prentice Hall, Englewood Cliffs, New Jersey, 1990], that for ordinary files such a modification improves the compression performance in lower, but not greater than 10, orders. However, for some kind of files, this modification gives much better compression performance than any known lossless data compression algorithm.