A method for speeding up text retrieval

  • Authors:
  • Per-Ake Larson

  • Affiliations:
  • University of Waterloo

  • Venue:
  • ACM SIGMIS Database - Database techniques and models for the office environment: selected papers from the Database Week Conference
  • Year:
  • 1984

Quantified Score

Hi-index 0.02

Visualization

Abstract

A simple method for speeding up the term detection phase of retrieval from a full-text document database is presented. The method makes use of a surrogate database, in which a document is represented as a sequence of hash signatures. Each signature represents a term occurring in the original document. The size of the surrogate database is expected to be only 5-10% of the original database. The major part of the work involved in term detection can be done utilizing this smaller database. It can either be scanned or inverted and used as an index. The term detection phase is speeded up significantly at a cost of increased processing when filing a document.