Machine learning for query-document matching in search

Authors:
Hang Li;Jun Xu
Affiliations:
Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 13
Cited 0

An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Online expansion of rare queries for sponsored search

Proceedings of the 18th international conference on World wide web
Named entity recognition in query

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Classification-enhanced ranking

Proceedings of the 19th international conference on World wide web
Learning similarity function for rare queries

Proceedings of the fourth ACM international conference on Web search and data mining
LambdaMerge: merging the results of query reformulations

Proceedings of the fourth ACM international conference on Web search and data mining
A kernel approach to addressing term mismatch

Proceedings of the 20th international conference companion on World wide web
A fast and accurate method for approximate string search

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parameterized concept weighting in verbose queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning relevance from heterogeneous social network and its application in online targeting
Clickthrough-based latent semantic models for web search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Regularized latent semantic indexing

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning a Robust Relevance Model for Search Using Kernel Methods

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In web search, relevance is one of the most important factors to meet users' satisfaction, and the success of a web search engine heavily depends on its performance on relevance. It has been observed that many hard cases in search relevance are due to term mismatch between query and documnt (e.g., query 'ny times' does not match well with document only containing 'new york times'), and thus it is not exaggerated to say that dealing with mismatch between query and document is one of the most critical research problems in web search. Recently researchers have spent significant effort to address the grand challenge. The major approach is to conduct more query and document understanding, and perform better matching between enriched query and document representations. With the availability of large amount of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently. In this tutorial, we will give a systematic and detailed presentation on newly developed machine learning technologies for query document matching in search. We will focus on the fundamental problems, as well as the novel solutions for query document matching at word form level, word sense level, topic level, and structure level. We will talk about novel technologies about query spelling error correction [3, 13], query rewriting [1, 4, 6, 7], query classification [2], topic modeling of documents [5, 9], query document matching [8, 10, 11, 12], and query document-title translation. The ideas and solutions introduced in this tutorial may motivate industrial practitioners to turn the research fruits into product reality. The summary of the state-of-the-art methods and the discussions on the technical issues in this tutorial may stimulate academic researchers to find new research directions and solutions. Matching between query and document is not limited to search, and similar problems can be observed at online advertisement, recommendation system, and other applications, as matching between objects from two spaces. The technologies we introduce can be generalized into more general machine learning techniques, which we call learning to match.