Static index pruning for information retrieval systems

  • Authors:
  • David Carmel;Doron Cohen;Ronald Fagin;Eitan Farchi;Michael Herscovici;Yoelle S. Maarek;Aya Soffer

  • Affiliations:
  • IBM Research Lab., Haifa, Israel;IBM Research Lab., Haifa, Israel;IBM Almaden Research Center, San Jose, CA;IBM Research Lab., Haifa, Israel;IBM Research Lab., Haifa, Israel;IBM Research Lab., Haifa, Israel;IBM Research Lab., Haifa, Israel

  • Venue:
  • Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce static index pruning methods that significantly reduce the index size in information retrieval systems.We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fixed cutoff threshold, and all index entries whose contribution to relevance scores is bounded above by a given threshold are removed from the index. In term-based pruning, the cutoff threshold is determined for each term, and thus may vary from term to term. We give experimental evidence that for each level of compression, term-based pruning outperforms uniform pruning, under various measures of precision. We present theoretical and experimental evidence that under our term-based pruning scheme, it is possible to prune the index greatly and still get retrieval results that are almost as good as those based on the full index.