Leveraging integrated information to extract query subtopics for search result diversification

  • Authors:
  • Wei Zheng;Hui Fang;Conglei Yao;Min Wang

  • Affiliations:
  • University of Delaware, Newark, USA;University of Delaware, Newark, USA;Tencent, Beijing, China;HP Labs China, Beijing, China

  • Venue:
  • Information Retrieval
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Search result diversification aims to diversify search results to cover different query subtopics, i.e., pieces of relevant information. The state of the art diversification methods often explicitly model the diversity based on query subtopics, and their performance is closely related to the quality of subtopics. Most existing studies extracted query subtopics only from the unstructured data such as document collections. However, there exists a huge amount of information from structured data, which complements the information from the unstructured data. The structured data can provide valuable information about domain knowledge, but is currently under-utilized. In this article, we study how to leverage the integrated information from both structured and unstructured data to extract high quality subtopics for search result diversification. We first discuss how to extract subtopics from structured data. We then propose three methods to integrate structured and unstructured data. Specifically, the first method uses the structured data to guide the subtopic extraction from unstructured data, the second one uses the unstructured data to guide the extraction, and the last one first extracts the subtopics separately from two data sources and then combines those subtopics. Experimental results in both Enterprise and Web search domains show that the proposed methods are effective in extracting high quality subtopics from the integrated information, which can lead to better diversification performance.