Improved hierarchical bit-vector compression in document retrieval systems

  • Authors:
  • A. S. Fraenkel;S. T. Klein;Y. Choueka;E. Segal

  • Affiliations:
  • Department of Applied Mathematics, The Weismann Institute of Science, Rehovot 76100, Israel;Department of Applied Mathematics, The Weismann Institute of Science, Rehovot 76100, Israel;Inst. for Information Retrieval and Computational Linguistics (IRCOL) - The Responsa Project and Department of Mathematics and Computer Science, Bar-Ilan University, Ramat Gan, Israel;Inst. for Information Retrieval and Computational Linguistics (IRCOL) - The Responsa Project

  • Venue:
  • Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1986

Quantified Score

Hi-index 0.00

Visualization

Abstract

The “concordance” of an information retrieval system can often be stored in form of bit-maps, which are usually very sparse and should be compressed. Hierarchical bit-vector compression consists of partitioning a vector vi into equi-sized blocks, constructing a new bit-vector vi+1 which points to the non-zero blocks in vi, dropping the zero-blocks of vi, and repeating the process for vi+1. We refine the method by pruning some of the tree branches if they ultimately point to very few documents; these document numbers are then added to an appended list which is compressed by the prefix-omission technique. The new method was thoroughly tested on the bit-maps of the Responsa Retrieval Project, and gave a relative improvement of about 40% over the conventional hierarchical compression method.