Regularized query classification using search click information

Authors:
Xiaofei He;Pradhuman Jhala
Affiliations:
Yahoo! Research Labs, 3333 Empire Avenue, Burbank, CA 91504, USA;Yahoo! Research Labs, 3333 Empire Avenue, Burbank, CA 91504, USA
Venue:
Pattern Recognition
Year:
2008

Citing 9
Cited 5

The nature of statistical learning theory

The nature of statistical learning theory
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
An information-theoretic approach to automatic query expansion

ACM Transactions on Information Systems (TOIS)
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Query Expansion by Mining User Logs

IEEE Transactions on Knowledge and Data Engineering
Query expansion using associated queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Face Recognition Using Laplacianfaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient and self-tuning incremental query expansion for top-k query processing

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Learning with click graph for query intent classification

ACM Transactions on Information Systems (TOIS)
Hessian optimal design for image retrieval

Pattern Recognition
Behavior-driven clustering of queries into topics

Proceedings of the 20th ACM international conference on Information and knowledge management
Confidence-aware graph regularization with heterogeneous pairwise features

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Clustering analysis and semantics annotation of 3d models based on users' implicit feedbacks

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web search engine, the query can be represented as a set of terms found on the web pages returned by search engine. In this way, each query can be considered as a point in high-dimensional space and standard classification algorithms such as regression can be applied. However, traditional regression is too flexible in situations with large numbers of highly correlated predictor variables. It may suffer from the overfitting problem. By using search click information, the semantic relationship between queries can be incorporated into the learning system as a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled queries, we select the one which best preserves the semantic relationship between queries. We present experimental evidence suggesting that the regularized regression algorithm is able to use search click information effectively for query classification.