Co-occurrence based predictors for estimating query difficulty

Authors:
Hazra Imran;Aditi Sharan
Affiliations:
-;-
Venue:
ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Year:
2010

Citing 0
Cited 2

Learning to judge image search results

MM '11 Proceedings of the 19th ACM international conference on Multimedia
When video search goes wrong: predicting query failure using search engine logs and visual search results

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query difficulty prediction aims to identify, in advance, how reliably an information retrieval system will perform when faced with a particular user request. The prediction of query difficulty level is an interesting and important issue in Information Retrieval (IR) and is still an open research. In order to appreciate importance of query difficulty prediction we present an example., Information Retrieval (IR) is the Science of searching the relevant documents based on user’s need and a way towards discovering knowledge from text data. User’s needs are often expressed in terms of query. It has been observed that there is a word mismatch problem while matching user’s query to the documents. This is because users and authors of documents do not use same vocabulary. Query expansion/reformulation is a method to overcome such mismatch in terminology. Query expansion (QE) has become a well known technique that has been shown to improve average retrieval performance. However despite extensive research QE does not provide consistent gains over different query sets and collections. Therefore this technique has not been used in many operational systems as it may degrade performance of individual queries. A thorough investigation into robustness of query expansion is required in order to ensure reliability of query expansion for individual queries. It is well-known in the Information Retrieval community that methods such as query expansion can help ”easy” queries but are detrimental to ”hard” queries If the performance of queries can be predicted before retrieval then specific measures can be taken to improve the overall performance of the system. In this paper we do thorough investigations of various query difficulty predictors l and suggest two new query predictorsl based on co-occurrence of query terms. To evaluate the predictors, we have experimented on standard TREC collections. Our work is significant as it is a step towards judging reliability and robustness of query processing operations such as query expansion.