Improved techniques for processing queries in full-text systems

  • Authors:
  • Y. Choueka;A. Fraenkel;S. Klein;E. Segal

  • Affiliations:
  • Inst. for Information Retrieval and Computational Linguistics (IRCOL) -- The Responsa Project and Department of Mathematics and Computer Science, Bar-Ilan University, Ramat Gan, Israel and On sabb ...;Department of Applied Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel;Department of Applied Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel;Inst. for Information Retrieval and Computational Linguistics (IRCOL) -- The Responsa Project

  • Venue:
  • SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1987

Quantified Score

Hi-index 0.00

Visualization

Abstract

In static full-text retrieval systems, which accommodate metrical as well as Boolean operators, the traditional approach to query processing uses a “concordance”, from which large sets of coordinates are retrieved and then merged and/or collated. Alternatively, in a system with l documents, the concordance can be replaced by a set of bit-maps of fixed length l, which are constructed for every different word of the database and serve as occurrence maps. We propose to combine the concordance and bit-map approaches, and show how this can speed up the processing of queries: fast ANDing and ORing of the maps in a preprocessing stage, lead to large I/O savings in collating coordinates of keywords needed to satisfy the metrical and Boolean constraints. Moreover, the bit-maps give partial information on the distribution of the coordinates of the keywords, which can be used when queries must be processed by stages, due to their complexity and the sizes of the involved sets of coordinates. The new techniques are partially implemented at the Responsa Retrieval Project.