How Effective is Stemming and Decompounding for German Text Retrieval?
Information Retrieval
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Generating query substitutions
Proceedings of the 15th international conference on World Wide Web
Word normalization and decompounding in mono- and bilingual IR
Information Retrieval
Evaluating verbose query processing techniques
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Recursive question decomposition for answering complex geographic questions
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Hi-index | 0.00 |
We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as "and" and "or") as logical disjunctions of terms to generate a set of disjunctionfree query variants for information retrieval (IR) queries. In addition, so-called hyphen coordinations are resolved by generating full compound forms and rephrasing the original query, e.g. "rice im-and export" is transformed into "rice import and export". Query variants are then processed separately and retrieval results are merged using a standard data fusion technique. We evaluate the approach on German standard IR benchmarking data. The results show that: i) Our proposed approach to generate compounds from hyphen coordinations produces the correct results for all test topics. ii) Our proposed heuristics to identify coordinations and generate query variants based on shallow natural language processing (NLP) techniques is highly accurate on the topics and does not rely on parsing or part-of-speech tagging. iii) Using query variants to produce multiple retrieval results and merging the results decreases precision at top ranks. However, in combination with blind relevance feedback (BRF), this approach can show significant improvement over the standard BRF baseline using the original queries.