One-time complete indexing of text: theory and practice

  • Authors:
  • Raymond J. D'Amore;Clinton P. Mah

  • Affiliations:
  • PAR Technology Corporation, 7926 Jones Branch Drive Suite 170, McLean, Virginia;PAR Technology Corporation, 7926 Jones Branch Drive Suite 170, McLean, Virginia

  • Venue:
  • SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1985

Quantified Score

Hi-index 0.00

Visualization

Abstract

Indexing according to occurrences of selected word fragments, called “n-grams”, offers a significant alternative to keyword indexing and full text scanning methods in the design of information systems based on documents. Finite sets of n-grams can be selected to allow effective fixed indexing of all words, numbers, and special terms in text. The characteristics of such indexing can be modeled statistically and validated over a wide range of text. The model provides a descriptive and predictive tool for controlling precision and recall in searching and for scaling estimates of relevance to an adaptive reference noise distribution for a target collection. Special techniques such as partial inversion of index terms, probabilistic ordering of index terms, and various types of data compression allow n-gram indexing to be competitive in performance with other approaches.