A Diagnostic Study of Search Result Diversification Methods

Authors:
Wei Zheng;Hui Fang
Affiliations:
Department of Electrical and Computer Engineering, University of Delaware, Newark, DE USA;Department of Electrical and Computer Engineering, University of Delaware, Newark, DE USA
Venue:
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Year:
2013

Citing 9
Cited 0

Quantifying query ambiguity

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
Selectively diversifying web search results

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Result diversification based on query-specific cluster ranking

Journal of the American Society for Information Science and Technology
Diagnostic Evaluation of Information Retrieval Models

ACM Transactions on Information Systems (TOIS)
A query performance analysis for result diversification

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Diversity by proportionality: an election-based approach to search result diversification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Coverage-based search result diversification

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search result diversification aims to maximize the coverage of different pieces of relevant information in the search results. Many diversification methods have been proposed and studied. However, the advantage and disadvantage of each method still remain unclear. In this paper, we conduct a diagnostic study over two state of the art diversification methods with the goal of identifying the weaknesses of these methods to further improve the performance. Specifically, we design a set of perturbation tests that isolate individual factors, i.e., relevance and diversity, which affect the diversification performance. The test results are expected to provide insights on how well each method deals with these factors in the diversification process. Experimental results suggest that some methods perform better in queries whose originally retrieved documents are more relevant to the query while other methods perform better when the documents are more diversified. We therefore propose methods to combine these existing methods based on the predicted factor of the query. The experimental results show that the combined methods can outperform individual methods on TREC collections.