Information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Viewing stemming as recall enhancement
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The pragmatics of information retrieval experimentation, revisited
Readings in information retrieval
Readings in information retrieval
An algorithm for suffix stripping
Readings in information retrieval
A stemming procedure and stopword list for general French corpora
Journal of the American Society for Information Science
Experiments with the Eurospider Retrieval System for CLEF 2000
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
West Group at CLEF 2000: Non-english Monolingual Retrieval
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Stemming Evaluated in 6 Languages by Hummingbird SearchServerTM at CLEF 2001
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Cross-language information retrieval: experiments based on CLEF 2000 corpora
Information Processing and Management: an International Journal
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Comparative study of monolingual and multilingual search models for use with asian languages
ACM Transactions on Asian Language Information Processing (TALIP)
Word normalization and decompounding in mono- and bilingual IR
Information Retrieval
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Proceedings of the 2006 ACM symposium on Applied computing
Design, implementation, and evaluation of a methodology for automatic stemmer generation
Journal of the American Society for Information Science and Technology
Indexing strategies for Swedish full text retrieval under different user scenarios
Information Processing and Management: an International Journal
Restricted inflectional form generation in management of morphological keyword variation
Information Retrieval
Searching strategies for the Hungarian language
Information Processing and Management: an International Journal
A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI)
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Current research issues and trends in non-English Web searching
Information Retrieval
Decompounding query keywords from compounding languages
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Does dictionary based bilingual retrieval work in a non-normalized index?
Information Processing and Management: an International Journal
Development of prototype morphological analyzer for the South Indian language of Kannada
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Adding multilingual information access to the European library
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages
ACM Transactions on Asian Language Information Processing (TALIP)
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Ad hoc retrieval with the Persian language
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Selecting automatically the best query translations
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
An investigation of decompounding for cross-language patent search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Is a morphologically complex language really that complex in full-text retrieval?
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Weighting query terms based on distributional statistics
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
A fuzzy ranking approach for improving search results in Turkish as an agglutinative language
Expert Systems with Applications: An International Journal
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Data fusion for effective european monolingual information retrieval
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Tools for nominalization: an alternative for lexical normalization
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Algorithms for the verification of the semantic relation between a compound and a given lexeme
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
MorphoSaurus in ImageCLEF 2006: the effect of subwords on biomedical IR
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Interpretation of coordinations, compound generation, and result fusion for query variants
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bridging abstraction layers in process mining by automated matching of events and activities
BPM'13 Proceedings of the 11th international conference on Business Process Management
Hi-index | 0.00 |
Information retrieval systems operating on free text face difficulties when word forms used in the query and documents do not match. The usual solution is the use of a “stemming component” that reduces related word forms to a common stem. Extensive studies of such components exist for English, but considerably less is known for other languages. Previously, it has been claimed that stemming is essential for highly declensional languages. We report on our experiments on stemming for German, where an additional issue is the handling of compounds, which are formed by concatenating several words. The major contribution of our work that goes beyond its focus on German lies in the investigation of a complete spectrum of approaches, ranging from language-independent to elaborate linguistic methods. The main findings are that stemming is beneficial even when using a simple approach, and that carefully designed decompounding, the splitting of compound words, remarkably boosts performance. All findings are based on a thorough analysis using a large reliable test collection.