Software—Practice & Experience
Text compression
Fast searching on compressed text allowing errors
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Modern Information Retrieval
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
Text Compression for Dynamic Document Databases
IEEE Transactions on Knowledge and Data Engineering
Boyer-Moore String Matching over Ziv-Lempel Compressed Text
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
An efficient compression code for text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Dynamic lightweight text compression
ACM Transactions on Information Systems (TOIS)
A fast dynamic compression scheme for natural language texts
Computers & Mathematics with Applications
Hi-index | 0.00 |
We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to 30–32% of its original size, permits maintaining the text compressed all the time, and offers efficient on-line information retrieval services. The novelty of PBDC is that it supports continuous growing of the compressed text collection, by automatically adapting the vocabulary both to new words and to changes in the word frequency distribution, without degrading the compression ratio. Text compressed with PBDC can be searched directly without decompression, using fast Boyer-Moore algorithms. It is also possible to decompress arbitrary portions of the collection. Alternative compression methods oriented to information retrieval focus on static collections and thus are less well suited to digital libraries.