The formalism of probability theory in IR: a foundation or an encumbrance?
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Improving recommendation lists through topic diversification
WWW '05 Proceedings of the 14th international conference on World Wide Web
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Less is more: probabilistic models for retrieving fewer relevant documents
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving personalized web search using result diversification
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Bypass rates: reducing query abandonment using negative inferences
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the Second ACM International Conference on Web Search and Data Mining
An axiomatic approach for result diversification
Proceedings of the 18th international conference on World wide web
Efficient Computation of Diverse Query Results
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Portfolio theory of information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Probabilistic models of ranking novel documents for faceted topic retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Redundancy, diversity and interdependent document relevance
ACM SIGIR Forum
A risk minimization framework for information retrieval
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Diversifying web search results
Proceedings of the 19th international conference on World wide web
DivQ: diversification for keyword search over structured databases
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Wikipedia as sense inventory to improve diversity in Web search results
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Result diversification based on query-specific cluster ranking
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with 驴-nDCG scores used in manual evaluation efforts.