Predicting good probabilities with supervised learning

Authors:
Alexandru Niculescu-Mizil;Rich Caruana
Affiliations:
Cornell University, Ithaca, NY;Cornell University, Ithaca, NY
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 2
Cited 48

Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

An empirical comparison of supervised learning algorithms

ICML '06 Proceedings of the 23rd international conference on Machine learning
Hierarchical classification: combining Bayes with SVM

ICML '06 Proceedings of the 23rd international conference on Machine learning
Mining citizen science data to predict orevalence of wild bird species

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating class priors in domain adaptation for word sense disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Comparison between neural networks and multiple logistic regression to predict acute coronary syndrome in the emergency room

Artificial Intelligence in Medicine
Minimax Regret Classifier for Imprecise Class Distributions

The Journal of Machine Learning Research
PAV and the ROC convex hull

Machine Learning
Temporal feature induction for baseball highlight classification

Proceedings of the 15th international conference on Multimedia
Building a hospital referral expert system with a Prediction and Optimization-Based Decision Support System algorithm

Journal of Biomedical Informatics
An empirical evaluation of supervised learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Cost-sensitive multi-class classification from probability estimates

Proceedings of the 25th international conference on Machine learning
PRIE: a system for generating rulelists to maximize ROC performance

Data Mining and Knowledge Discovery
Putting Things in Order: On the Fundamental Role of Ranking in Classification and Probability Estimation

ECML '07 Proceedings of the 18th European conference on Machine Learning
Classifier Loss Under Metric Uncertainty

ECML '07 Proceedings of the 18th European conference on Machine Learning
Putting Things in Order: On the Fundamental Role of Ranking in Classification and Probability Estimation

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Learning Distance Functions for Automatic Annotation of Images

Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Naive Bayes for optimal ranking

Journal of Experimental & Theoretical Artificial Intelligence
Information Extraction

Foundations and Trends in Databases
Consistent phrase relevance measures

Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising
The ROC isometrics approach to construct reliable classifiers

Intelligent Data Analysis
Calibrating Probability Density Forecasts with Multi-objective Search

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting

Pattern Recognition
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Exploiting contexts to deal with uncertainty in classification

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Proactive intrusion detection

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Voronoi random fields: extracting the topological structure of indoor environments via place labeling

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Estimation of class membership probabilities in the document classification

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Combining clauses with various precisions and recalls to produce accurate probabilistic estimates

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
SVM-FuzCoC: A novel SVM-based feature selection method using a fuzzy complementary criterion

Pattern Recognition
A decision support system for cost-effective diagnosis

Artificial Intelligence in Medicine
Service-oriented information extraction

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Calibrated lazy associative classification

Information Sciences: an International Journal
Categorization of display ads using image and landing page features

Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
An iterative semi-supervised approach to software fault prediction

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
A unifying view on dataset shift in classification

Pattern Recognition
Robust probabilistic calibration

ECML'06 Proceedings of the 17th European conference on Machine Learning
Attribute and object selection queries on objects with probabilistic attributes

ACM Transactions on Database Systems (TODS)
Estimating the risk of fire outbreaks in the natural environment

Data Mining and Knowledge Discovery
Feature weighted minimum distance classifier with multi-class confidence estimation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Editors Choice Article: I2VM: Incremental import vector machines

Image and Vision Computing
Design principles of massive, robust prediction systems

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Probability estimation for multi-class classification based on label ranking

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Outlier detection for patient monitoring and alerting

Journal of Biomedical Informatics
Partial Least Square Discriminant Analysis for bankruptcy prediction

Decision Support Systems
Ad click prediction: a view from the trenches

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A unified view of performance metrics: translating threshold choice into expected classification loss

The Journal of Machine Learning Research
Accurate probability calibration for multiple classifiers

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Multimedia event detection with multimodal feature fusion and temporal concept localization

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. We show that maximum margin methods such as boosted trees and boosted stumps push probability mass away from 0 and 1 yielding a characteristic sigmoid shaped distortion in the predicted probabilities. Models such as Naive Bayes, which make unrealistic independence assumptions, push probabilities toward 0 and 1. Other models such as neural nets and bagged trees do not have these biases and predict well calibrated probabilities. We experiment with two ways of correcting the biased probabilities predicted by some learning methods: Platt Scaling and Isotonic Regression. We qualitatively examine what kinds of distortions these calibration methods are suitable for and quantitatively examine how much data they need to be effective. The empirical results show that after calibration boosted trees, random forests, and SVMs predict the best probabilities.