Mining search engine clickthrough log for matching N-gram features

  • Authors:
  • Huihsin Tseng;Longbin Chen;Fan Li;Ziming Zhuang;Lei Duan;Belle Tseng

  • Affiliations:
  • Yahoo! Inc., Santa Clara, CA;Yahoo! Inc., Santa Clara, CA;Yahoo! Inc., Santa Clara, CA;Yahoo! Inc., Santa Clara, CA;Yahoo! Inc., Santa Clara, CA;Yahoo! Inc., Santa Clara, CA

  • Venue:
  • EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

User clicks on a URL in response to a query are extremely useful predictors of the URL's relevance to that query. Exact match click features tend to suffer from severe data sparsity issues in web ranking. Such sparsity is particularly pronounced for new URLs or long queries where each distinct query-url pair will rarely occur. To remedy this, we present a set of straightforward yet informative query-url n-gram features that allows for generalization of limited user click data to large amounts of unseen query-url pairs. The method is motivated by techniques leveraged in the NLP community for dealing with unseen words. We find that there are interesting regularities across queries and their preferred destination URLs; for example, queries containing "form" tend to lead to clicks on URLs containing "pdf". We evaluate our set of new query-url features on a web search ranking task and obtain improvements that are statistically significant at a p-value