The automatic extraction of words from texts especially for input into information retrieval systems based on inverted files

Authors:
Kevin P. Jones;Colin L. M. Bell
Affiliations:
Malaysian Rubber Producers' Research Association, Tun Abdul Razak Laboratory, Brickendonbury, Hertford, England;Malaysian Rubber Producers' Research Association, Tun Abdul Razak Laboratory, Brickendonbury, Hertford, England
Venue:
SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1984

Citing 0
Cited 1

Trigrams as index element in full text retrieval: observations and experimental results

CSC '93 Proceedings of the 1993 ACM conference on Computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic extraction of words from texts to form the input for information retrieval systems based on inverted files is partly considered on a theoretical basis, and partly in relation to experience gained from developing what has become an operational system. This system was developed to operate on abstracted texts, but is being modified to handle more extended texts either for input into an inverted file or as a stage in creating pre-coordinate indexes. The system is capable of handling compound words, homographs, and synonyms and identifying particular forms of text (such as authors) on the basis of what are termed semantic markers.