Effective text compression with simultaneous digram and trigram encoding
Journal of Information Science
Optimizing a text retrieval system utilizing N-gram indexing
Optimizing a text retrieval system utilizing N-gram indexing
A study of trigrams and their feasibility as index terms in a full text information retrieval system
A study of trigrams and their feasibility as index terms in a full text information retrieval system
One-time complete indexing of text: theory and practice
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
ACM SIGIR Forum
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Comparative analysis of hardware versus software text search
SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing inverted files and signature files for searching a large lexicon
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
TinyLex: static n-gram index pruning with perfect recall
Proceedings of the 17th ACM conference on Information and knowledge management
Hi-index | 0.00 |
A trigram is a three element sequence of characters. In this paper we demonstrate the effectiveness of a trigram based index for morphologically based retrievals from a full text document retrieval system. Retrieved documents are considered relevant if they contain exact matches for each of the query terms. Using this definition of relevance we consistently achieve a recall rate of 100%. In the experiments described here, we used sets of 100 anded three term queries, and the average precision per set varied from 47% to 87%. We propose a method for increasing the average precision to 100%. Using overlapping trigrams extracted from the Brown Corpus [KUCE67] and a character set of 45 elements, we found a horizontal asymptote near 11,000 for the number of entries in a trigram based index. Finally we show that a trigram based system provides a reasonable alternative to a word based one and is superior to it in retrievals of word fragments.