Predicting query difficulty on the web by learning visual clues

  • Authors:
  • Eric C. Jensen;Steven M. Beitzel;David Grossman;Ophir Frieder;Abdur Chowdhury

  • Affiliations:
  • Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL;America Online, Inc., Dulles, VA

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a method for predicting query difficulty in a precision-oriented web search task. Our approach uses visual features from retrieved surrogate document representations (titles, snippets, etc.) to predict retrieval effectiveness for a query. By training a supervised machine learning algorithm with manually evaluated queries, visual clues indicative of relevance are discovered. We show that this approach has a moderate correlation of 0.57 with precision at 10 scores from manual relevance judgments of the top ten documents retrieved by ten web search engines over 896 queries. Our findings indicate that difficulty predictors which have been successful in recall-oriented ad-hoc search, such as clarity metrics, are not nearly as correlated with engine performance in precision-oriented tasks such as this, yielding a maximum correlation of 0.3. Additionally, relying only on visual clues avoids the need for collection statistics that are required by these prior approaches. This enables our approach to be employed in environments where these statistics are unavailable or costly to retrieve, such as metasearch.