IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Explicitly representing expected cost: an alternative to ROC representation
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Crafting Papers on Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Data mining in metric space: an empirical analysis of supervised learning performance criteria
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
Predicting clicks: estimating the click-through rate for new ads
Proceedings of the 16th international conference on World Wide Web
A noisy-channel approach to contextual advertising
Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Online learning from click data for sponsored search
Proceedings of the 17th international conference on World Wide Web
Contextual advertising by combining relevance with click feedback
Proceedings of the 17th international conference on World Wide Web
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
An experimental comparison of performance measures for classification
Pattern Recognition Letters
Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement
ACM SIGKDD Explorations Newsletter
Proceedings of the 20th international conference on World wide web
On the informativeness of cascade and intent-aware effectiveness measures
Proceedings of the 20th international conference on World wide web
Sponsored search auctions with conflict constraints
Proceedings of the fifth ACM international conference on Web search and data mining
Post-click conversion modeling and analysis for non-guaranteed delivery display advertising
Proceedings of the fifth ACM international conference on Web search and data mining
Relational click prediction for sponsored search
Proceedings of the fifth ACM international conference on Web search and data mining
Fast and cost-efficient bid estimation for contextual ads
Proceedings of the 21st international conference companion on World Wide Web
Hi-index | 0.00 |
We study the accuracy of evaluation metrics used to estimate the efficacy of predictive models. Offline evaluation metrics are indicators of the expected model performance on real data. However, in practice we often experience substantial discrepancy between the offline and online performance of the models. We investigate the characteristics and behaviors of the evaluation metrics on offline and online testing both analytically and empirically by experimenting them on online advertising data from the Bing search engine. One of our findings is that some offline metrics like AUC (the Area Under the Receiver Operating Characteristic Curve) and RIG (Relative Information Gain) that summarize the model performance on the entire spectrum of operating points could be quite misleading sometimes and result in significant discrepancy in offline and online metrics. For example, for click prediction models for search advertising, errors in predictions in the very low range of predicted click scores impact the online performance much more negatively than errors in other regions. Most of the offline metrics we studied including AUC and RIG, however, are insensitive to such model behavior. We designed a new model evaluation paradigm that simulates the online behavior of predictive models. For a set of ads selected by a new prediction model, the online user behavior is estimated from the historic user behavior in the search logs. The experimental results on click prediction model for search advertising are highly promising.