Extending Huffman coding for multilingual text compression

  • Authors:
  • Chi-Kwun Kan;Ling Wong

  • Affiliations:
  • -;-

  • Venue:
  • DCC '95 Proceedings of the Conference on Data Compression
  • Year:
  • 1995

Quantified Score

Hi-index 0.01

Visualization

Abstract

Summary form only given. We propose two new algorithms that are based on the 16-bit or 32-bit sampling character set and on the unique features of languages with a large number of distinct characters to improve the data compression ratios for multilingual text documents. We choose Chinese language using 16 bit character sampling as the representative language in our study. The first approach, called the static Chinese Huffman coding, introduces the concept of a single Chinese character in the Huffman tree. Experimental results showed that the improvement in compression ratio obtained. The second approach, called the dictionary-based Chinese Huffman coding, includes the concept of Chinese words in the Huffman coding.