Software—Practice & Experience
Text compression
Self-organized language modeling for speech recognition
Readings in speech recognition
ACM Transactions on Information Systems (TOIS)
Compression techniques for Chinese text
Software—Practice & Experience
A study on word-based and integral-bit Chinese text compression algorithms
Journal of the American Society for Information Science
Text Mining: A New Frontier for Lossless Compression
DCC '99 Proceedings of the Conference on Data Compression
DCC '02 Proceedings of the Data Compression Conference
Combining PPM Models Using A Text Mining Approach
DCC '01 Proceedings of the Data Compression Conference
Modelling Chinese For Text Compression
DCC '05 Proceedings of the Data Compression Conference
Dynamic Markov Compression Using a Crossbar-Like Tree Initial Structure for Chinese Texts
ICITA '05 Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) Volume 2 - Volume 02
Communications of the ACM
Hi-index | 0.03 |
Large alphabet languages such as Chinese are very different from English, and therefore present different problems for text compression. In this article, we first examine the characteristics of Chinese, then we introduce a new variant of the Prediction by Partial Match (PPM) model especially for Chinese characters. Unlike the traditional PPM coding schemes, which encodes an escape probability if a novel character occurs in the context, the new coding scheme directly encodes the order first before encoding a symbol, without having to output an escape probability. This scheme achieves excellent compression rates in comparison with other schemes on a variety of Chinese text files.