Learning similarity function for rare queries

Authors:
Jingfang Xu;Gu Xu
Affiliations:
Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 26
Cited 8

Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Text classification using string kernels

The Journal of Machine Learning Research
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Mining search engine query logs for query recommendation

Proceedings of the 15th international conference on World Wide Web
Learning low-rank kernel matrices

ICML '06 Proceedings of the 23rd international conference on Machine learning
Exploring distributional similarity based models for query spelling correction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Information-theoretic metric learning

Proceedings of the 24th international conference on Machine learning
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Fast solvers and efficient implementations for distance metric learning

Proceedings of the 25th international conference on Machine learning
Query suggestion using hitting time

Proceedings of the 17th ACM conference on Information and knowledge management
Mining term association patterns from search logs for effective query reformulation

Proceedings of the 17th ACM conference on Information and knowledge management
Learning latent semantic relations from clickthrough data for query suggestion

Proceedings of the 17th ACM conference on Information and knowledge management
Online expansion of rare queries for sponsored search

Proceedings of the 18th international conference on World wide web
An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Supervised semantic indexing

Proceedings of the 18th ACM conference on Information and knowledge management
Learning term-weighting functions for similarity measures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
Optimal rare query suggestion with implicit user feedback

Proceedings of the 19th international conference on World wide web

Empirical Study on Rare Query Characteristics

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Machine learning for query-document matching in search

Proceedings of the fifth ACM international conference on Web search and data mining
Efficient query recommendations in the long tail via center-piece subgraphs

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Beyond bag-of-words: machine learning for query-document matching in web search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Collaborative ranking: improving the relevance for tail queries

Proceedings of the 21st ACM international conference on Information and knowledge management
Learning query and document similarities from click-through bipartite graph with metadata

Proceedings of the sixth ACM international conference on Web search and data mining
Query expansion using path-constrained random walks

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Mining search and browse logs for web search: A Survey

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The key element of many query processing tasks can be formalized as calculation of similarities between queries. These include query suggestion, query reformulation, and query expansion. Although many methods have been proposed for query similarity calculation, they could perform poorly on rare queries. As far as we know, there was no previous work particularly about rare query similarity calculation, and this paper tries to study this problem. Specifically, we address three problems. Firstly, we define an n-gram space to represent queries with their own content and a similarity function to measure the similarities between queries. Secondly, we propose learning the similarity function by leveraging the training data derived from user behavior data. This is formalized as an optimization problem and a metric learning approach is employed to solve it efficiently. Finally, we exploit locality sensitive hashing for efficient retrieval of similar queries from a large query repository. We experimentally verified the effectiveness of the proposed approach by showing that our method can indeed enhance the accuracy of query similarity calculation for rare queries and efficiently retrieve similar queries. As an application, we also experimentally demonstrated that the similar queries found by our method can significantly improve search relevance.