The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Secondary key retrieval using an IBM 7090-1301 system
Communications of the ACM
On the use of bit maps for multiple key retrieval
Proceedings of the 1976 conference on Data : Abstraction, definition and structure
Improved techniques for processing queries in full-text systems
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of concordances in full-text retrieval systems
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Storing text retrieval systems on CD-ROM: compression and encryption considerations
ACM Transactions on Information Systems (TOIS)
Storing text retrieval systems on CD-ROM: compression and encryption considerations
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Construction of optimal graphs for bit-vector compression
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Generative models for bitmap sets with compression applications: (extended abstract)
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Parameterised compression for sparse bitmaps
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient recompression techniques for dynamic full-text retrieval systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting content-bearing words by serial clustering—extended abstract
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling word occurrences for the compression of concordances
ACM Transactions on Information Systems (TOIS)
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
An Efficient Algorithm for Incremental Update of Concept Spaces
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The “concordance” of an information retrieval system can often be stored in form of bit-maps, which are usually very sparse and should be compressed. Hierarchical bit-vector compression consists of partitioning a vector vi into equi-sized blocks, constructing a new bit-vector vi+1 which points to the non-zero blocks in vi, dropping the zero-blocks of vi, and repeating the process for vi+1. We refine the method by pruning some of the tree branches if they ultimately point to very few documents; these document numbers are then added to an appended list which is compressed by the prefix-omission technique. The new method was thoroughly tested on the bit-maps of the Responsa Retrieval Project, and gave a relative improvement of about 40% over the conventional hierarchical compression method.