A general-purpose compression scheme for large collections
ACM Transactions on Information Systems (TOIS)
Block Merging for Off-Line Compression
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Compression of databases not only reduces space requirements but can also reduce overall retrieval times. We have described elsewhere our RAY algorithm for compressing databases containing general-purpose data, such as images, sound, and also text. We describe here an extension to the RAY compression algorithm that permits use on very large databases.In this approach, we build a model based on a small training set and use the model to compress large databases. Our preliminary implementation is slow for compression, but only slightly slower in decompression speed than the popular GZIP scheme. Importantly, we show that the compression effectiveness of our approach is excellent and markedly better than the GZIP and COMPRESS algorithms on our test sets.