Single n-gram stemming

Authors:
James Mayfield;Paul McNamee
Affiliations:
The Johns Hopkins University, Laurel MD;The Johns Hopkins University, Laurel MD
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 4
Cited 16

One term or two?

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Guessing morphology from terms and corpora

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
An algorithm for suffix stripping

Readings in information retrieval

n-gram/2L: a space and time efficient two-level n-gram inverted index structure

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Developing an automatic linguistic truncation operator for best-match retrieval of Finnish in inflected word form text database indexes

Journal of Information Science
Restricted inflectional form generation in management of morphological keyword variation

Information Retrieval
Structural optimization of a full-text n-gram index using relational normalization

The VLDB Journal — The International Journal on Very Large Data Bases
TinyLex: static n-gram index pruning with perfect recall

Proceedings of the 17th ACM conference on Information and knowledge management
Addressing morphological variation in alphabetic languages

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
JHU ad hoc experiments at CLEF 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
CRTER: using cross terms to enhance probabilistic information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A novel corpus-based stemming algorithm using co-occurrence statistics

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An unsupervised method to improve Spanish stemmer

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Is a morphologically complex language really that complex in full-text retrieval?

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Exploring new languages with HAIRCUT at CLEF 2005

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
STEMBR: a stemming algorithm for the Brazilian Portuguese language

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Tools for nominalization: an alternative for lexical normalization

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Effective and Robust Query-Based Stemming

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way, but its use incurs a performance penalty. We demonstrate that selection of a single n-gram as a pseudo-stem for a word can be an effective and efficient language-neutral approach for some languages.