Algorithmic stemmers or morphological analysis? An evaluation

Authors:
Claire Fautsch;Jacques Savoy
Affiliations:
Computer Science Department, University of Neuchåtel, 2009 Neuchåtel, Switzerland;Computer Science Department, University of Neuchåtel, 2009 Neuchåtel, Switzerland
Venue:
Journal of the American Society for Information Science and Technology
Year:
2009

Citing 0
Cited 5

Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages

ACM Transactions on Asian Language Information Processing (TALIP)
Ad hoc retrieval with the Persian language

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Authorship Attribution Based on Specific Vocabulary

ACM Transactions on Information Systems (TOIS)
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Toward a model of domain-specific search

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is important in information retrieval (IR), information extraction, or classification tasks that morphologically related forms are conflated under the same stem (using stemmer) or lemma (using morphological analyzer). To achieve this for the English language, algorithmic stemming or various morphological analysis approaches have been suggested. Based on Cross-Language Evaluation Forum test collections containing 284 queries and various IR models, this article evaluates these word-normalization proposals. Stemming improves the mean average precision significantly by around 7% while performance differences are not significant when comparing various algorithmic stemmers or algorithmic stemmers and morphological analysis. Accounting for thesaurus class numbers during indexing does not modify overall retrieval performances. Finally, we demonstrate that including a stop word list, even one containing only around 10 terms, might significantly improve retrieval performance, depending on the IR model. © 2009 Wiley Periodicals, Inc.