Pseudo relevance feedback using semantic clustering in relevance language model

Authors:
Qiang Pu;Daqing He
Affiliations:
University of Electronic Science and Technology of China, Chengdu, China;University of Pittsburgh, Pittsburgh, PA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 4
Cited 1

Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Interactive sense feedback for difficult queries

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pseudo relevance feedback has demonstrated to be in general an effective technique for improving retrieval effectiveness, but the noise in the top retrieved documents still can cause topic drift problem that affects the performance of certain topics. By viewing a document as an interaction of a set of independent hidden topics, we propose a novel semantic clustering technique using independent component analysis. Then within the language modeling framework, we apply the obtained semantic topic clusters into the query sampling process so that the sampling depends on the activated topics rather than on the individual document language model. Therefore, we obtain a semantic cluster based relevance language model, which uses pseudo relevance feedback technique without requiring any relevance training information. We applied the model on five TREC data sets. The experiments show that our model can significantly improve retrieval performance over traditional language models including relevance-based and clustering-based retrieval language models. The main contribution of the improvements comes from the estimation of the relevance model on the semantic clusters that are closely related to the query.