Copulas for information retrieval

Authors:
Carsten Eickhoff;Arjen P. de Vries;Kevyn Collins-Thompson
Affiliations:
Delft University of Technology, Delft, Netherlands;Centrum Wiskunde & Informatica, Amsterdam, Netherlands;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 33
Cited 0

A re-examination of relevance: toward a dynamic, situational definition

Information Processing and Management: an International Journal
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Relevance: the whole history

Journal of the American Society for Information Science - Special topic issue on the history of documentation and information science: part II
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance score normalization for metasearch

Proceedings of the tenth international conference on Information and knowledge management
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Condorcet fusion for improved retrieval

Proceedings of the eleventh international conference on Information and knowledge management
Data fusion with estimated weights

Proceedings of the eleventh international conference on Information and knowledge management
Fusion Via a Linear Combination of Scores

Information Retrieval
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Formal models for expert finding in enterprise corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Why we tag: motivations for annotation in mobile and online media

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Multidimensional Relevance: A New Aggregation Criterion

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
How children search the internet with keyword interfaces

Proceedings of the 8th International Conference on Interaction Design and Children
A unified relevance model for opinion retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Combining evidence for relevance criteria: a framework and experiments in web retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Score distribution models: assumptions, intuition, and robustness to score manipulation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Blog track research at TREC

ACM SIGIR Forum
A combined topical/non-topical approach to identifying web sites for children

Proceedings of the fourth ACM international conference on Web search and data mining
Modeling score distributions in information retrieval

Information Retrieval
Personalizing web search results by reading level

Proceedings of the 20th ACM international conference on Information and knowledge management
Field-weighted XML retrieval based on BM25

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Measuring the ability of score distributions to model relevance

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Score transformation in linear combination for multi-criteria relevance ranking

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Personalized diversification of search results

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking. Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated learning methods that make it difficult for humans to understand the origin of the final ranking. To address these issues, we introduce the use of copulas, a powerful statistical framework for modeling complex multi-dimensional dependencies, to information retrieval tasks. We provide a formal background to copulas and demonstrate their effectiveness on standard IR tasks such as combining multidimensional relevance estimates and fusion of results from multiple search engines. We introduce copula-based versions of standard relevance estimators and fusion methods and show that these lead to significant performance improvements on several tasks, as evaluated on large-scale standard corpora, compared to their non-copula counterparts. We also investigate criteria for understanding the likely effect of using copula models in a given retrieval scenario.