High Performance Word-Codeword Mapping Algorithm on PPM

  • Authors:
  • Joaquín Adiego;Miguel A. Martinez-Prieto;Pablo de la Fuente

  • Affiliations:
  • -;-;-

  • Venue:
  • DCC '09 Proceedings of the 2009 Data Compression Conference
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The word-codeword mapping technique allows words to be managed in PPM modelling when a natural language text file is being compressed. The main idea for managing words is to assign them codes in order to improve the compression. The previous work was focused on proposing several mapping adaptive algorithms and evaluating them. In this paper, we propose a semi-static word-codeword mapping method that takes advantage of by previous knowledge of some statistical data of the vocabulary. We test our idea implementing a basic prototype, dubbed mppm2, which also retains all the desirable features of a word-codeword mapping technique. The comparison with other techniques and compressors shows that our proposal is a very competitive choice for compressing natural language texts. In fact, empirical results show that our prototype achieves a very good compression for this type of documents.