Less is more: probabilistic models for retrieving fewer relevant documents

Authors:
Harr Chen;David R. Karger
Affiliations:
MIT CSAIL, Cambridge, MA;MIT CSAIL, Cambridge, MA
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 11
Cited 105

The probability ranking principle in IR

Readings in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
TREC-8 interactive track

ACM SIGIR Forum
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Evaluating high accuracy retrieval techniques

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring ineffectiveness

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Linear discriminant model for information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Ambiguous requests: implications for retrieval tests, systems and theories

ACM SIGIR Forum
Evaluating epistemic uncertainty under incomplete assessments

Information Processing and Management: an International Journal
A probability ranking principle for interactive information retrieval

Information Retrieval
Learning diverse rankings with multi-armed bandits

Proceedings of the 25th international conference on Machine learning
Predicting diverse subsets using structural SVMs

Proceedings of the 25th international conference on Machine learning
Ambiguous queries: test collections need more sense

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Bypass rates: reducing query abandonment using negative inferences

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Identification of ambiguous queries in web search

Information Processing and Management: an International Journal
An axiomatic approach for result diversification

Proceedings of the 18th international conference on World wide web
Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Risk-Aware Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
What Else Is There? Search Diversity Examined

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Generic and Spatial Approaches to Image Search Results Diversification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Risky business: modeling and exploiting uncertainty in information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Portfolio theory of information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
An Effectiveness Measure for Ambiguous and Underspecified Queries

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
An Analysis of NP-Completeness in Novelty and Diversity Ranking

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
The Quantum Probability Ranking Principle for Information Retrieval

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Learning to Disambiguate Search Queries from Short Sessions

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Essential Pages

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Probabilistic models of ranking novel documents for faceted topic retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Jointly optimising relevance and diversity in image retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
Competing for users' attention: on the interplay between organic and sponsored search results

Proceedings of the 19th international conference on World wide web
Diversifying web search results

Proceedings of the 19th international conference on World wide web
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
Overview of the ImageCLEFphoto 2008 photographic retrieval task

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
On statistical analysis and optimization of information retrieval effectiveness metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
DivQ: diversification for keyword search over structured databases

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Metrics for assessing sets of subtopics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning to rank relevant and novel documents through user feedback

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Selectively diversifying web search results

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Approximation algorithms for diversified search ranking

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
University of Jaén at ImageCLEF 2009: medical and photo tasks

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Mining and explaining relationships in wikipedia

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
Detecting duplicate web documents using clickthrough data

Proceedings of the fourth ACM international conference on Web search and data mining
Towards a collection-based results diversification

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
A weighted-graph-based approach for diversifying search results

International Journal of Knowledge and Web Intelligence
Learning query ambiguity models by using search logs

Journal of Computer Science and Technology
Effective large scale text retrieval via learning risk-minimization and dependency-embedded model

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
An analysis of NP-completeness in novelty and diversity ranking

Information Retrieval
Search result diversity for informational queries

Proceedings of the 20th international conference on World wide web
Consideration set generation in commerce search

Proceedings of the 20th international conference on World wide web
Recommender systems by means of information retrieval

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Efficient diversity-aware search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Incremental diversification for very large sets: a streaming-based approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Intent-aware search result diversification

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Diversity in ranking via resistive graph centers

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Selecting a comprehensive set of reviews

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An analysis of ranking principles and retrieval strategies

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Aggregated search result diversification

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Clustering web search results with maximum spanning trees

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Diversifying Product Review Rankings: Getting the Full Picture

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Suggestion set utility maximization using session logs

Proceedings of the 20th ACM international conference on Information and knowledge management
Structured learning of two-level dynamic rankings

Proceedings of the 20th ACM international conference on Information and knowledge management
Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model

Proceedings of the 20th ACM international conference on Information and knowledge management
Inferring query aspects from reformulations using clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Intent-based diversification of web search results: metrics and algorithms

Information Retrieval
Using the quantum probability ranking principle to rank interdependent documents

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Interaction and personalization of criteria in recommender systems

UMAP'10 Proceedings of the 18th international conference on User Modeling, Adaptation, and Personalization
Max-Sum diversification, monotone submodular functions and dynamic updates

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Top-k retrieval using facility location analysis

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
On optimality-ratio and coverage in ranking of joined search results

Distributed and Parallel Databases
Online learning to diversify from implicit feedback

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Explicit relevance models in intent-oriented information retrieval diversification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Adaptive diversification of recommendation results via latent factor portfolio

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Personalized diversification of search results

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Combining implicit and explicit topic representations for result diversification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the mathematical relationship between expected n-call@k and the relevance vs. diversity trade-off

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering

Proceedings of the sixth ACM conference on Recommender systems
Coverage-based search result diversification

Information Retrieval
Reranking web search results for diversity

Information Retrieval
On the role of novelty for search result diversification

Information Retrieval
Comparing the robustness of expansion techniques and retrieval measures

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Dynamic covering for recommendation systems

Proceedings of the 21st ACM international conference on Information and knowledge management
Comprehension-based result snippets

Proceedings of the 21st ACM international conference on Information and knowledge management
Measuring the coverage and redundancy of information search services on e-commerce platforms

Electronic Commerce Research and Applications
Diversifying user comments on news articles

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
mNIR: diversifying search results based on a mixture of novelty, intention and relevance

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Explicit diversification of image search

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Reducing information redundancy in search results

Proceedings of the 28th Annual ACM Symposium on Applied Computing
A survey of faceted search

Journal of Web Engineering
Exploiting relevance, coverage, and novelty for query-focused multi-document summarization

Knowledge-Based Systems
Sentiment diversification with different biases

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Toward whole-session relevance: exploring intrinsic diversity in web search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Search result diversification in resource selection for federated search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Term level search result diversification

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Profile diversity in search and recommendation

Proceedings of the 22nd international conference on World Wide Web companion
Interactive exploratory search for multi page search results

Proceedings of the 22nd international conference on World Wide Web
Exploiting the diversity of user preferences for recommendation

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Max-sum diversification on image ranking with non-uniform matroid constraints

Neurocomputing
A learning approach to optimizing exploration---exploitation tradeoff in relevance feedback

Information Retrieval
Using maximum coverage to optimize recommendation systems in e-commerce

Proceedings of the 7th ACM conference on Recommender systems
Mining subtopics from different aspects for diversifying search results

Information Retrieval
Increasing evaluation sensitivity to diversity

Information Retrieval
GBPR: group preference based Bayesian personalized ranking for one-class collaborative filtering

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
CLiMF: collaborative less-is-more filtering

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Latent dirichlet allocation based diversified retrieval for e-commerce search

Proceedings of the 7th ACM international conference on Web search and data mining
Discovering social circles in ego networks

ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue
The notion of diversity in graphical entity summarisation on semantic knowledge graphs

Journal of Intelligent Information Systems
Probabilistic models in IR and their relationships

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, information retrieval systems aim to maximize thenumber of relevant documents returned to a user within some windowof the top. For that goal, the probability ranking principle, whichranks documents in decreasing order of probability of relevance, isprovably optimal. However, there are many scenarios in which thatranking does not optimize for the users information need. Oneexample is when the user would be satisfied with some limitednumber of relevant documents, rather than needing all relevantdocuments. We show that in such a scenario, an attempt to returnmany relevant documents can actually reduce the chances of findingany relevant documents.We consider a number of information retrieval metrics from theliterature, including the rank of the first relevant result, the%no metric that penalizes a system only for retrieving no relevantresults near the top, and the diversity of retrieved results whenqueries have multiple interpretations. We observe that given aprobabilistic model of relevance, it is appropriate to rank so asto directly optimize these metrics in expectation. While doing somay be computationally intractable, we show that a simple greedyoptimization algorithm that approximately optimizes the givenobjectives produces rankings for TREC queries that outperform thestandard approach based on the probability ranking principle.