Edge-guided natural language text compression

  • Authors:
  • Joaquín Adiego Adiego;Miguel A. Martínez-Prieto;Pablo De La Fuente

  • Affiliations:
  • Depto. de Informática, Universidad de Valladolid, Valladolid, Spain;Depto. de Informática, Universidad de Valladolid, Valladolid, Spain;Depto. de Informática, Universidad de Valladolid, Valladolid, Spain

  • Venue:
  • SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a novel compression technique for natural language text collections which takes advantage of the information provided by edges when a graph is used to model the text. This technique is called edge-guided compression. We propose an algorithm that allows the text to be transformed in agreement with the edge-guided technique in conjunction with the spaceless words transformation. The result of these transformations is a PPM-friendly byte-stream that has to be codified with a PPM family encoder. The comparison with state-of-art compressors shows that our proposal is a competitive choice for medium and large natural language text collections.