Extending Huffman coding for multilingual text compression

Authors:
Chi-Kwun Kan;Ling Wong
Affiliations:
-;-
Venue:
DCC '95 Proceedings of the Conference on Data Compression
Year:
1995

Citing 0
Cited 1

Multi-Lingual Cascading Text Compressors for WWW

ITCC '00 Proceedings of the The International Conference on Information Technology: Coding and Computing (ITCC'00)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Summary form only given. We propose two new algorithms that are based on the 16-bit or 32-bit sampling character set and on the unique features of languages with a large number of distinct characters to improve the data compression ratios for multilingual text documents. We choose Chinese language using 16 bit character sampling as the representative language in our study. The first approach, called the static Chinese Huffman coding, introduces the concept of a single Chinese character in the Huffman tree. Experimental results showed that the improvement in compression ratio obtained. The second approach, called the dictionary-based Chinese Huffman coding, includes the concept of Chinese words in the Huffman coding.