Efficiently decodable and searchable natural language adaptive compression

Authors:
Nieves R. Brisaboa;Antonio Fariña;Gonzalo Navarro;José R. Paramá
Affiliations:
University da Coruña, A Coruña, Spain;University da Coruña, A Coruña, Spain;University of Chile, Santiago, Chile;University da Coruña, Coruña, Spain
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 13
Cited 5

Word-based text compression

Software—Practice & Experience
Text compression

Text compression
Fast searching on compressed text allowing errors

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm

Communications of the ACM
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Modern Information Retrieval

Modern Information Retrieval
Compression and Coding Algorithms

Compression and Coding Algorithms
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Adding Compression to Block Addressing Inverted Indexes

Information Retrieval
Compression: A Key for Next-Generation Text Retrieval Systems

Computer
An efficient compression code for text databases

ECIR'03 Proceedings of the 25th European conference on IR research

New adaptive compressors for natural language text

Software—Practice & Experience
Dynamic lightweight text compression

ACM Transactions on Information Systems (TOIS)
A fast dynamic compression scheme for natural language texts

Computers & Mathematics with Applications
Enhanced byte codes with restricted prefix properties

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
ODC: Frame for definition of Dense codes

European Journal of Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of adaptive compression of natural language text, focusing on the case where low bandwidth is available and the receiver has little processing power, as in mobile applications. Our technique achieves compression ratios around 32% and requires very little effort from the receiver. This tradeoff, not previously achieved with alternative techniques, is obtained by breaking the usual symmetry between sender and receiver dominant in statistical adaptive compression. Moreover, we show that our technique can be adapted to avoid decompression at all in cases where the receiver only wants to detect the presence of some keywords in the document. This is useful in scenarios such as selective dissemination of information, news clipping, alert systems, text categorization, and clustering. Thanks to the asymmetry we introduce, the receiver can search the compressed text much faster than the plain text. This was previously achieved only in semistatic compression scenarios.