Models of English Text

Authors:
W. J. Teahan;John G. Cleary
Affiliations:
-;-
Venue:
DCC '97 Proceedings of the Conference on Data Compression
Year:
1997

Citing 0
Cited 3

Universal Text Preprocessing for Data Compression

IEEE Transactions on Computers
Revisiting dictionary-based compression: Research Articles

Software—Practice & Experience
Natural Language Compression on Edge-Guided text preprocessing

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of constructing models of English text is considered. A number of applications of such models including cryptology, spelling correction and speech recognition are reviewed. The best current models for English text have been the result of research into compression. Not only is this an important application of such models but the amount of compression provides a measure of how well such models perform. Three main classes of models are considered: character based models, word based models, and models which use auxiliary information in the form of parts of speech. These models are compared in terms of their memory usage and compression.