An information-theoretic approach to text searching in direct access systems
Communications of the ACM
Theory of Indexing
Generating a dynamic hypertext environment with n-gram analysis
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Trigrams as index element in full text retrieval: observations and experimental results
CSC '93 Proceedings of the 1993 ACM conference on Computer science
Recursive hashing functions for n-grams
ACM Transactions on Information Systems (TOIS)
Natural Language Processing and Information Retrieval
Information Extraction: Towards Scalable, Adaptable Systems
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Assessing creative problem-solving with automated text grading
Computers & Education
TinyLex: static n-gram index pruning with perfect recall
Proceedings of the 17th ACM conference on Information and knowledge management
Improved stable retrieval in noisy collections
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Application of variable length N-gram vectors to monolingual and bilingual information retrieval
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Efficient fuzzy search in large text collections
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Indexing according to occurrences of selected word fragments, called “n-grams”, offers a significant alternative to keyword indexing and full text scanning methods in the design of information systems based on documents. Finite sets of n-grams can be selected to allow effective fixed indexing of all words, numbers, and special terms in text. The characteristics of such indexing can be modeled statistically and validated over a wide range of text. The model provides a descriptive and predictive tool for controlling precision and recall in searching and for scaling estimates of relevance to an adaptive reference noise distribution for a target collection. Special techniques such as partial inversion of index terms, probabilistic ordering of index terms, and various types of data compression allow n-gram indexing to be competitive in performance with other approaches.