Shortest-substring retrieval and ranking

  • Authors:
  • Charles L. A. Clarke;Gordon V. Cormack

  • Affiliations:
  • Univ. of Toronto, Toronto, Ont. Canada;Univ. of Waterloo, Waterloo, Ont. Canada

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present a model for arbitrary passage retrieval using Boolean queries. The model is applied to the task of ranking documents, or other structural elements, in the order of their expected relevance. Features such as phrase matching, truncation, and stemming integrate naturally into the model. Properties of Boolean algebra are obeyed, and the exact-match semantics of Boolean retrieval are preserved. Simple inverted-list file structures provide an efficient implementation. Retrieval effectiveness is comparable to that of standard ranking techniques. Since global statistics are not used, the method is of particular value in distributed environments. Since ranking is based on arbitrary passages, the structural elements to be ranked may be specified at query time and do not need to be restricted to predefined elements.