Interpretation of coordinations, compound generation, and result fusion for query variants

Authors:
Johannes Leveling
Affiliations:
Dublin City University, Dublin, Ireland
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 8
Cited 0

Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding

Information Retrieval
How Effective is Stemming and Decompounding for German Text Retrieval?

Information Retrieval
Empirical methods for compound splitting

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Word normalization and decompounding in mono- and bilingual IR

Information Retrieval
Evaluating verbose query processing techniques

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Recursive question decomposition for answering complex geographic questions

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
The domain-specific track in CLEF 2004: overview of the results and remarks on the assessment process

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as "and" and "or") as logical disjunctions of terms to generate a set of disjunctionfree query variants for information retrieval (IR) queries. In addition, so-called hyphen coordinations are resolved by generating full compound forms and rephrasing the original query, e.g. "rice im-and export" is transformed into "rice import and export". Query variants are then processed separately and retrieval results are merged using a standard data fusion technique. We evaluate the approach on German standard IR benchmarking data. The results show that: i) Our proposed approach to generate compounds from hyphen coordinations produces the correct results for all test topics. ii) Our proposed heuristics to identify coordinations and generate query variants based on shallow natural language processing (NLP) techniques is highly accurate on the topics and does not rely on parsing or part-of-speech tagging. iii) Using query variants to produce multiple retrieval results and merging the results decreases precision at top ranks. However, in combination with blind relevance feedback (BRF), this approach can show significant improvement over the standard BRF baseline using the original queries.