Implementing ranking strategies using text signatures

  • Authors:
  • W. Bruce Croft;Pasquale Savino

  • Affiliations:
  • Univ. of Massachusetts, Amherst;Ing. C. Olivetti & Co., Pisa, Italy

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

Signature files provide an efficient access method for text in documents, but retrieval is usually limited to finding documents that contain a specified Boolean pattern of words. Effective retrieval requires that documents with similar meanings be found through a process of plausible inference. The simplest way of implementing this retrieval process is to rank documents in order of their probability of relevance. In this paper techniques are described for implementing probabilistic ranking strategies with sequential and bit-sliced signature tiles and the limitations of these implementations with regard to their effectiveness are pointed out. A detailed comparison is made between signature-based ranking techniques and ranking using term-based document representatives and inverted files. The comparison shows that term-based representations are at least competitive (in terms of efficiency) with signature files and, in some situations, superior.