IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The maximum entropy method for analyzing retrieval measures
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Optimisation methods for ranking functions with multiple parameters
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On rank-based effectiveness measures and optimization
Information Retrieval
A support vector method for optimizing average precision
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SoftRank: optimizing non-smooth rank metrics
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A new interpretation of average precision
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Precision-at-ten considered redundant
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision when judgments are incomplete
Knowledge and Information Systems
AUC: a statistically consistent and more discriminating measure than accuracy
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Gradient descent optimization of smoothed information retrieval metrics
Information Retrieval
On statistical analysis and optimization of information retrieval effectiveness metrics
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On the informativeness of cascade and intent-aware effectiveness measures
Proceedings of the 20th international conference on World wide web
On the suitability of diversity metrics for learning-to-rank for diversity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On smoothing average precision
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Robust query rewriting using anchor data
Proceedings of the sixth ACM international conference on Web search and data mining
Two-Stage learning to rank for information retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
The whens and hows of learning to rank for web search
Information Retrieval
Hi-index | 0.00 |
Most current machine learning methods for building search engines are based on the assumption that there is a target evaluation metric that evaluates the quality of the search engine with respect to an end user and the engine should be trained to optimize for that metric. Treating the target evaluation metric as a given, many different approaches (e.g. LambdaRank, SoftRank, RankingSVM, etc.) have been proposed to develop methods for optimizing for retrieval metrics. Target metrics used in optimization act as bottlenecks that summarize the training data and it is known that some evaluation metrics are more informative than others. In this paper, we consider the effect of the target evaluation metric on learning to rank. In particular, we question the current assumption that retrieval systems should be designed to directly optimize for a metric that is assumed to evaluate user satisfaction. We show that even if user satisfaction can be measured by a metric X, optimizing the engine on a training set for a more informative metric Y may result in a better test performance according to X (as compared to optimizing the engine directly for X on the training set). We analyze the situations as to when there is a significant difference in the two cases in terms of the amount of available training data and the number of dimensions of the feature space.