Learning query and document similarities from click-through bipartite graph with metadata

  • Authors:
  • Wei Wu;Hang Li;Jun Xu

  • Affiliations:
  • Microsoft Research Asia, Beijing, China;Noah's Ark Lab of Huawei Technologies, Hong Kong, China;Noah's Ark Lab of Huawei Technologies, Hong Kong, China

  • Venue:
  • Proceedings of the sixth ACM international conference on Web search and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider learning query and document similarities from a click-through bipartite graph with metadata on the nodes. The metadata contains multiple types of features of queries and documents. We aim to leverage both the click-through bipartite graph and the features to learn query-document, document-document, and query-query similarities. The challenges include how to model and learn the similarity functions based on the graph data. We propose solving the problems in a principled way. Specifically, we use two different linear mappings to project the queries and documents in two different feature spaces into the same latent space, and take the dot product in the latent space as their similarity. Query-query and document-document similarities can also be naturally defined as dot products in the latent space. We formalize the learning of similarity functions as learning of the mappings that maximize the similarities of the observed query-document pairs on the enriched click-through bipartite graph. When queries and documents have multiple types of features, the similarity function is defined as a linear combination of multiple similarity functions, each based on one type of features. We further solve the learning problem by using a new technique called Multi-view Partial Least Squares (M-PLS). The advantages include the global optimum which can be obtained through Singular Value Decomposition (SVD) and the capability of finding high quality similar queries. We conducted large scale experiments on enterprise search data and web search data. The experimental results on relevance ranking and similar query finding demonstrate that the proposed method works significantly better than the baseline methods.