Predicting query difficulty on the web by learning visual clues

Authors:
Eric C. Jensen;Steven M. Beitzel;David Grossman;Ophir Frieder;Abdur Chowdhury
Affiliations:
Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;America Online, Inc., Dulles, VA
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 2
Cited 7

Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A framework for determining necessary query set sizes to evaluate web search effectiveness

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web

Ranking robustness: a novel framework to predict query performance

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Measuring ranked list robustness for query performance prediction

Knowledge and Information Systems
Predicting Neighbor Goodness in Collaborative Filtering

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Learning to judge image search results

MM '11 Proceedings of the 19th ACM international conference on Multimedia
An analysis on topic features and difficulties based on web navigational retrieval experiments

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A learning approach to optimizing exploration---exploitation tradeoff in relevance feedback

Information Retrieval
Combining pre-retrieval query quality predictors using genetic programming

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a method for predicting query difficulty in a precision-oriented web search task. Our approach uses visual features from retrieved surrogate document representations (titles, snippets, etc.) to predict retrieval effectiveness for a query. By training a supervised machine learning algorithm with manually evaluated queries, visual clues indicative of relevance are discovered. We show that this approach has a moderate correlation of 0.57 with precision at 10 scores from manual relevance judgments of the top ten documents retrieved by ten web search engines over 896 queries. Our findings indicate that difficulty predictors which have been successful in recall-oriented ad-hoc search, such as clarity metrics, are not nearly as correlated with engine performance in precision-oriented tasks such as this, yielding a maximum correlation of 0.3. Additionally, relying only on visual clues avoids the need for collection statistics that are required by these prior approaches. This enables our approach to be employed in environments where these statistics are unavailable or costly to retrieve, such as metasearch.