Information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Viewing stemming as recall enhancement
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The pragmatics of information retrieval experimentation, revisited
Readings in information retrieval
Readings in information retrieval
An algorithm for suffix stripping
Readings in information retrieval
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Experiments with the Eurospider Retrieval System for CLEF 2000
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
West Group at CLEF 2000: Non-english Monolingual Retrieval
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Stemming Evaluated in 6 Languages by Hummingbird SearchServerTM at CLEF 2001
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Cross-language information retrieval: experiments based on CLEF 2000 corpora
Information Processing and Management: an International Journal
Unsupervised learning of the morphology of a natural language
Computational Linguistics
A novel method for stemmer generation based on hidden markov models
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Monolingual Document Retrieval for European Languages
Information Retrieval
Embedding web-based statistical translation models in cross-language information retrieval
Computational Linguistics - Special issue on web as corpus
A probabilistic model for stemmer generation
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
Unsupervised and knowledge-free learning of compound splits and periphrases
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
The stemming problem, i.e. finding a common stem for different forms of a term, has been extensively studied for English, but considerably less is known for other languages. Previously, it has been claimed that stemming is essential for highly declensional languages. We report on our experiments on stemming for German, where an additional issue is the handling of compounds, which are formed by concatenating several words. Rarely do studies on stemming for any language cover more than one or two different approaches. This paper makes a major contribution that transcends its focus on German by investigating a complete spectrum of approaches, ranging from language-independent to elaborate linguistic methods. The main findings are that stemming is beneficial even when using a simple approach, and that carefully designed decompounding, the splitting of compound words, remarkably boosts performance. All findings are based on a thorough analysis using a large reliable test collection.