Software—Practice & Experience
Algorithm 673: Dynamic Huffman coding
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Information Systems (TOIS)
Fast searching on compressed text allowing errors
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Efficiently decodable and searchable natural language adaptive compression
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Lightweight natural language text compression
Information Retrieval
New adaptive compressors for natural language text
Software—Practice & Experience
An efficient compression code for text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Natural Language Compression per Blocks
CCP '11 Proceedings of the 2011 First International Conference on Data Compression, Communications and Processing
Enhanced byte codes with restricted prefix properties
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Natural language compression has made great progress in the last two decades. The main step in this evolution was the introduction of word-based compression by Moffat. Another improvement came with so-called Dense codes, which proved to be very fast in compression and decompression while keeping a good compression ratio and direct search capability. Many variants of the Dense codes have been described, each of them using its own definition. In this paper, we present a generalized concept of dense coding called Open Dense Code (ODC), which aims to be a frame for the definition of many other dense code schemas. ODC underlines common features of the dense code schemas but at the same time allows one to express the divergences of each of them. Using the frame of ODC, we present two new word-based statistical compression algorithms based on the dense coding idea: Two Byte Dense Code (TBDC) and Self-Tuning Dense Code (STDC). Our algorithms improve the compression ratio and are considerate to smaller files, which are very often omitted by other compressors.