Pseudo relevance feedback using semantic clustering in relevance language model

  • Authors:
  • Qiang Pu;Daqing He

  • Affiliations:
  • University of Electronic Science and Technology of China, Chengdu, China;University of Pittsburgh, Pittsburgh, PA, USA

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pseudo relevance feedback has demonstrated to be in general an effective technique for improving retrieval effectiveness, but the noise in the top retrieved documents still can cause topic drift problem that affects the performance of certain topics. By viewing a document as an interaction of a set of independent hidden topics, we propose a novel semantic clustering technique using independent component analysis. Then within the language modeling framework, we apply the obtained semantic topic clusters into the query sampling process so that the sampling depends on the activated topics rather than on the individual document language model. Therefore, we obtain a semantic cluster based relevance language model, which uses pseudo relevance feedback technique without requiring any relevance training information. We applied the model on five TREC data sets. The experiments show that our model can significantly improve retrieval performance over traditional language models including relevance-based and clustering-based retrieval language models. The main contribution of the improvements comes from the estimation of the relevance model on the semantic clusters that are closely related to the query.