Efficiently decodable and searchable natural language adaptive compression

  • Authors:
  • Nieves R. Brisaboa;Antonio Fariña;Gonzalo Navarro;José R. Paramá

  • Affiliations:
  • University da Coruña, A Coruña, Spain;University da Coruña, A Coruña, Spain;University of Chile, Santiago, Chile;University da Coruña, Coruña, Spain

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of adaptive compression of natural language text, focusing on the case where low bandwidth is available and the receiver has little processing power, as in mobile applications. Our technique achieves compression ratios around 32% and requires very little effort from the receiver. This tradeoff, not previously achieved with alternative techniques, is obtained by breaking the usual symmetry between sender and receiver dominant in statistical adaptive compression. Moreover, we show that our technique can be adapted to avoid decompression at all in cases where the receiver only wants to detect the presence of some keywords in the document. This is useful in scenarios such as selective dissemination of information, news clipping, alert systems, text categorization, and clustering. Thanks to the asymmetry we introduce, the receiver can search the compressed text much faster than the plain text. This was previously achieved only in semistatic compression scenarios.