The automatic extraction of words from texts especially for input into information retrieval systems based on inverted files

  • Authors:
  • Kevin P. Jones;Colin L. M. Bell

  • Affiliations:
  • Malaysian Rubber Producers' Research Association, Tun Abdul Razak Laboratory, Brickendonbury, Hertford, England;Malaysian Rubber Producers' Research Association, Tun Abdul Razak Laboratory, Brickendonbury, Hertford, England

  • Venue:
  • SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1984

Quantified Score

Hi-index 0.00

Visualization

Abstract

The automatic extraction of words from texts to form the input for information retrieval systems based on inverted files is partly considered on a theoretical basis, and partly in relation to experience gained from developing what has become an operational system. This system was developed to operate on abstracted texts, but is being modified to handle more extended texts either for input into an inverted file or as a stage in creating pre-coordinate indexes. The system is capable of handling compound words, homographs, and synonyms and identifying particular forms of text (such as authors) on the basis of what are termed semantic markers.