New adaptive compressors for natural language text

Authors:
N. R. Brisaboa;A. Fariña;G. Navarro;J. R. Parama
Affiliations:
Database Laboratory, Department of Computer Science, University of A Coruña, Campus de Elviña s-n, 15071, A Coruña, Spain;Database Laboratory, Department of Computer Science, University of A Coruña, Campus de Elviña s-n, 15071, A Coruña, Spain;Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile;Database Laboratory, Department of Computer Science, University of A Coruña, Campus de Elviña s-n, 15071, A Coruña, Spain
Venue:
Software—Practice & Experience
Year:
2008

Citing 14
Cited 3

Dynamic Huffman coding

Journal of Algorithms
A locally adaptive data compression scheme

Communications of the ACM
Word-based text compression

Software—Practice & Experience
Fast text searching: allowing errors

Communications of the ACM
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm

Communications of the ACM
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Compression and Coding Algorithms

Compression and Coding Algorithms
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Factor Oracle: A New Structure for Pattern Matching

SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Efficiently decodable and searchable natural language adaptive compression

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles

Software—Practice & Experience
Lightweight natural language text compression

Information Retrieval
Enhanced byte codes with restricted prefix properties

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Dynamic lightweight text compression

ACM Transactions on Information Systems (TOIS)
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections

Proceedings of the VLDB Endowment
ODC: Frame for definition of Dense codes

European Journal of Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semistatic byte-oriented word-based compression codes have been shown to be an attractive alternative to compress natural language text databases, because of the combination of speed, effectiveness, and direct searchability they offer. In particular, our recently proposed family of dense compression codes has been shown to be superior to the more traditional byte-oriented word-based Huffman codes in most aspects. In this paper, we focus on the problem of transmitting texts among peers that do not share the vocabulary. This is the typical scenario for adaptive compression methods. We design adaptive variants of our semistatic dense codes, showing that they are much simpler and faster than dynamic Huffman codes and reach almost the same compression effectiveness. We show that our variants have a very compelling trade-off between compression-decompression speed, compression ratio, and search speed compared with most of the state-of-the-art general compressors. Copyright © 2008 John Wiley & Sons, Ltd. A preliminary partial version on this work appeared in [1]