Modeling for text compression

Authors:
Timothy Bell;Ian H. Witten;John G. Cleary
Affiliations:
Univ. of Canterbury, Christchurch, New Zealand;Univ. of Calgary, Calgary, Alta., Canada;Univ. of Calgary, Calgary, Alta., Canada
Venue:
ACM Computing Surveys (CSUR)
Year:
1989

Citing 35
Cited 32

Algorithms for adaptive Huffman codes

Information Processing Letters
Parallel algorithms for data compression

Journal of the ACM (JACM)
Approximate counting: a detailed analysis

BIT - Ellis Horwood series in artificial intelligence
Dynamic Huffman coding

Journal of Algorithms
Data compression: techniques and applications

Data compression: techniques and applications
Compression of character strings by an adaptive dictionary

BIT
A locally adaptive data compression scheme

Communications of the ACM
Syntax-directed compression of program files

Software—Practice & Experience
Data compression for a source with Markov characteristics

The Computer Journal
Interval and recency rank source coding: two on-line adaptive variable-length schemes

IEEE Transactions on Information Theory
Using simulated annealing to design good codes

IEEE Transactions on Information Theory
Predictive test compression by hashing

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Design and analysis of dynamic Huffman codes

Journal of the ACM (JACM)
Data compression: methods and theory

Data compression: methods and theory
Data compression using dynamic Markov modelling

The Computer Journal
Data compression

ACM Computing Surveys (CSUR)
Dynamic-history predictive compression

Information Systems
On the privacy afforded by adaptive text compression

Computers and Security
Application of splay trees to data compression

Communications of the ACM
A note on the DMC data compression scheme

The Computer Journal
An adaptive dependency source model for data compression

Communications of the ACM
Data compression with finite windows

Communications of the ACM
Data compression (3rd ed.): techniques and applications: hardware and software considerations

Data compression (3rd ed.): techniques and applications: hardware and software considerations
Algorithm 673: Dynamic Huffman coding

ACM Transactions on Mathematical Software (TOMS)
Arithmetic coding for data compression

Communications of the ACM
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Linear Algorithm for Data Compression via String Matching

Journal of the ACM (JACM)
Data compression via textual substitution

Journal of the ACM (JACM)
Inductive Inference: Theory and Methods

ACM Computing Surveys (CSUR)
Counting large numbers of events in small registers

Communications of the ACM
Experiments in text file compression

Communications of the ACM
Common phrases and minimum-space text storage

Communications of the ACM
Source coding algorithms for fast data compression.

Source coding algorithms for fast data compression.
Parameter reduction and context selection for compression of gray-scale images

IBM Journal of Research and Development

Construction of optimal graphs for bit-vector compression

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the concepts of compression and caching for a two-level filesystem

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An honors course in data compression

SIGCSE '91 Proceedings of the twenty-second SIGCSE technical symposium on Computer science education
Generative models for bitmap sets with compression applications: (extended abstract)

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
On-line data compression in a log-structured file system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Vague text compression

ACM SIGACT News
Modeling word occurrences for the compression of concordances

ACM Transactions on Information Systems (TOIS)
Streaming BDD manipulation for large-scale combinatorial problems

Proceedings of the conference on Design, automation and test in Europe
Streaming BDD Manipulation

IEEE Transactions on Computers
Zone Morphological Processing of Texts in Natural Languages

Cybernetics and Systems Analysis
An Efficient Indexing Technique for Full Text Databases

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Data Compression Support in Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Learning nonstationary models of normal network traffic for detecting novel attacks

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Context-sensitive mobile database summarisation

ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
Multi-Lingual Cascading Text Compressors for WWW

ITCC '00 Proceedings of the The International Conference on Information Technology: Coding and Computing (ITCC'00)
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Design considerations for the ALDC cores

IBM Journal of Research and Development
Comparing inverted files and signature files for searching a large lexicon

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Energy aware lossless data compression

Proceedings of the 1st international conference on Mobile systems, applications and services
Energy-aware lossless data compression

ACM Transactions on Computer Systems (TOCS)
An adaptive character wordlength algorithm for data compression

Computers & Mathematics with Applications
An effective and robust method for short text classification

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Efficient index compression in DB2 LUW

Proceedings of the VLDB Endowment
PPM with the extended alphabet

Information Sciences: an International Journal
Enhancing secrecy by data compression: theoretical and practical aspects

EUROCRYPT'91 Proceedings of the 10th annual international conference on Theory and application of cryptographic techniques
Post BWT stages of the Burrows–Wheeler compression algorithm

Software—Practice & Experience
Computational approaches to suspicion in adversarial settings

Information Systems Frontiers
Polymorphic compression

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Suffix tree based data compression

SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
A comparison of index-based lempel-Ziv LZ77 factorization algorithms

ACM Computing Surveys (CSUR)
Data-Aware, resource-aware, lossless compression for sensor networks

EWSN'13 Proceedings of the 10th European conference on Wireless Sensor Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The best schemes for text compression use large models to help them predict which characters will come next. The actual next characters are coded with respect to the prediction, resulting in compression of information. Models are best formed adaptively, based on the text seen so far. This paper surveys successful strategies for adaptive modeling that are suitable for use in practical text compression systems.The strategies fall into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one; finite-state modeling, in which the distribution is conditioned by the current state (and which subsumes finite-context modeling as an important special case); and dictionary modeling, in which strings of characters are replaced by pointers into an evolving dictionary. A comparison of different methods on the same sample texts is included, along with an analysis of future research directions.