Regularized query classification using search click information

  • Authors:
  • Xiaofei He;Pradhuman Jhala

  • Affiliations:
  • Yahoo! Research Labs, 3333 Empire Avenue, Burbank, CA 91504, USA;Yahoo! Research Labs, 3333 Empire Avenue, Burbank, CA 91504, USA

  • Venue:
  • Pattern Recognition
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web search engine, the query can be represented as a set of terms found on the web pages returned by search engine. In this way, each query can be considered as a point in high-dimensional space and standard classification algorithms such as regression can be applied. However, traditional regression is too flexible in situations with large numbers of highly correlated predictor variables. It may suffer from the overfitting problem. By using search click information, the semantic relationship between queries can be incorporated into the learning system as a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled queries, we select the one which best preserves the semantic relationship between queries. We present experimental evidence suggesting that the regularized regression algorithm is able to use search click information effectively for query classification.