Trigrams as index element in full text retrieval: observations and experimental results

Authors:
Elizabeth S. Adams;Arnold C. Meltzer
Affiliations:
Hood College, Department of Mathematics and Computer Science, Frederick, MD;George Washington University, Department of Electrical Engineering and Computer Science, Washington, DC
Venue:
CSC '93 Proceedings of the 1993 ACM conference on Computer science
Year:
1993

Citing 8
Cited 2

Effective text compression with simultaneous digram and trigram encoding

Journal of Information Science
Optimizing a text retrieval system utilizing N-gram indexing

Optimizing a text retrieval system utilizing N-gram indexing
A study of trigrams and their feasibility as index terms in a full text information retrieval system

A study of trigrams and their feasibility as index terms in a full text information retrieval system
One-time complete indexing of text: theory and practice

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A stop list for general text

ACM SIGIR Forum
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Comparative analysis of hardware versus software text search

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
The automatic extraction of words from texts especially for input into information retrieval systems based on inverted files

SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval

Comparing inverted files and signature files for searching a large lexicon

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
TinyLex: static n-gram index pruning with perfect recall

Proceedings of the 17th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A trigram is a three element sequence of characters. In this paper we demonstrate the effectiveness of a trigram based index for morphologically based retrievals from a full text document retrieval system. Retrieved documents are considered relevant if they contain exact matches for each of the query terms. Using this definition of relevance we consistently achieve a recall rate of 100%. In the experiments described here, we used sets of 100 anded three term queries, and the average precision per set varied from 47% to 87%. We propose a method for increasing the average precision to 100%. Using overlapping trigrams extracted from the Brown Corpus [KUCE67] and a character set of 45 elements, we found a horizontal asymptote near 11,000 for the number of entries in a trigram based index. Finally we show that a trigram based system provides a reasonable alternative to a word based one and is superior to it in retrievals of word fragments.