Discovering missing click-through query language information for web search

  • Authors:
  • Xing Yi;James Allan

  • Affiliations:
  • University of Massachusetts, Amherst, AMHERST, MA, USA;University of Massachusetts, Amherst, AMHERST, MA, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The click-through information in web query logs has been widely used for web search tasks. However, it usually suffers from the data sparseness problem, known as the missing/incomplete click problems, where large volume of pages receive few or no clicks. In this paper, we adapt two language modeling based approaches to address this issue in the context of using web query logs for web search. The first approach discovers missing click-through query language features for web pages with no or few clicks from their similar pages' click-associated queries in the query logs, to help search. We further propose combining this content based approach with the random walk approach on the click graph to further reduce click-through sparseness for search. The second approach follows the query expansion method and utilizes the queries and their clicked web pages in the query logs to reconstruct a structured variant of the relevance based language models for each user-input query for search. We design experiments with a publicly available query log excerpt and two TREC web search tasks on the GOV2 and ClueWeb09 corpora to evaluate the search performance of different approaches. Our results show that using discovered semantic click-through query language features can statistically significantly improve search performance, compared with the baselines that do not use the discovered information. The combination approach that uses discovered click-through features from both random walk and the content based approach can further improve search performance.