Best document selection based on approximate utility optimization

Authors:
Hungyu Henry Lin;Yi Zhang;James Davis
Affiliations:
University of California, Santa Cruz, Santa Cruz, CA, USA;University of California, Santa Cruz, Santa Cruz, CA, USA;University of California, Santa Cruz, Santa Cruz, CA, USA
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 5
Cited 0

A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Finding high-quality content in social media

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
On the convergence of bound optimization algorithms

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This poster describes an alternative approach to handling the best document selection problem. Best document selection is a common problem with many real world applications, but is not a well studied problem by itself; a simple solution would be to treat it as a ranking problem and to use existing ranking algorithms to rank all documents. We could then select only the first element of the sorted list. However, because ranking models optimize for all ranks, the model may sacrifice accuracy of the top rank for the sake of overall accuracy. This is an unnecessary trade-off. We begin by first defining an appropriate objective function for the domain, then create a boosting algorithm that explicitly targets this function. Based on experiments on a benchmark retrieval data set and Digg.com news commenting data set, we find that even a simple algorithm built for this specific problem gives better results than baseline algorithms that were designed for the more complicated ranking tasks.