Learning dense models of query similarity from user click logs

Authors:
Fabio De Bona;Stefan Riezler;Keith Hall;Massimiliano Ciaramita;Amaç Herdaǧdelen;Maria Holmqvist
Affiliations:
Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany;Google Research, Zürich, Switzerland;Google Research, Zürich, Switzerland;Google Research, Zürich, Switzerland;University of Trento, Rovereto, Italy;Linkopings University, Linkopings, Sweden
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 22
Cited 2

Word association norms, mutual information, and lexicography

Computational Linguistics
Automatic feedback using past queries: social searching?

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Patterns of search: analyzing and modeling Web query refinement

UM '99 Proceedings of the seventh international conference on User modeling
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Contextual correlates of synonymy

Communications of the ACM
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ranking and Reranking with Perceptron

Machine Learning
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Magnitude-preserving ranking algorithms

Proceedings of the 24th international conference on Machine learning
A support vector method for optimizing average precision

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Structured learning for non-smooth ranking losses

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating labels from clicks

Proceedings of the Second ACM International Conference on Web Search and Data Mining
The Unreasonable Effectiveness of Data

IEEE Intelligent Systems
From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Analysis of multiple query reformulations on the web: The interactive information retrieval context

Information Processing and Management: an International Journal
Efficient algorithms for ranking with SVMs

Information Retrieval
Generalized syntactic and semantic models of query reformulation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Generalized syntactic and semantic models of query reformulation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Collaborative ranking: improving the relevance for tail queries

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this work is to integrate query similarity metrics as features into a dense model that can be trained on large amounts of query log data, in order to rank query rewrites. We propose features that incorporate various notions of syntactic and semantic similarity in a generalized edit distance framework. We use the implicit feedback of user clicks on search results as weak labels in training linear ranking models on large data sets. We optimize different ranking objectives in a stochastic gradient descent framework. Our experiments show that a pairwise SVM ranker trained on multipartite rank levels outperforms other pairwise and listwise ranking methods under a variety of evaluation metrics.