Multi-Lingual Cascading Text Compressors for WWW

  • Authors:
  • Chi-Hung Chi

  • Affiliations:
  • -

  • Venue:
  • ITCC '00 Proceedings of the The International Conference on Information Technology: Coding and Computing (ITCC'00)
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Global sharing and distribution of information on Internet result in a great demand for efficient multi-lingual text compression for web server and proxy implementation. Current text compressors such as Huffman coding, Lempel-Ziv (LZ) variants, and LZ-Huffman cascading fail to perform efficiently because of the mismatched character sampling size and the large character set of the multilingual languages. Our previous research [7,8] already showed that better compression ratio could be obtained by re-adjusting the character-sampling rate.In this paper, we investigate the cascading of LZ variants to Huffman coding for multilingual documents. Two basic approaches, static and dynamic dictionaries, are proposed. Techniques for reducing the dictionary overhead are also suggested. Based on our multi-lingual corpus, our adaptive cascading scheme can perform better than the well-known cascading compressor, gzip, by an average of about 20%.