Trigrams as index element in full text retrieval: observations and experimental results
CSC '93 Proceedings of the 1993 ACM conference on Computer science
Hi-index | 0.00 |
The automatic extraction of words from texts to form the input for information retrieval systems based on inverted files is partly considered on a theoretical basis, and partly in relation to experience gained from developing what has become an operational system. This system was developed to operate on abstracted texts, but is being modified to handle more extended texts either for input into an inverted file or as a stage in creating pre-coordinate indexes. The system is capable of handling compound words, homographs, and synonyms and identifying particular forms of text (such as authors) on the basis of what are termed semantic markers.