Cost-sensitive multi-class classification from probability estimates

Authors:
Deirdre B. O'Brien;Maya R. Gupta;Robert M. Gray
Affiliations:
Google Inc., Mountain View, CA;University Of Washington, Seattle, WA;Stanford University, Stanford, CA
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 6
Cited 2

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments

Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting good probabilities with supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning

Cost-sensitive learning based on Bregman divergences

Machine Learning
Towards cost-sensitive learning for real-world applications

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

For two-class classification, it is common to classify by setting a threshold on class probability estimates, where the threshold is determined by ROC curve analysis. An analog for multi-class classification is learning a new class partitioning of the multiclass probability simplex to minimize empirical misclassification costs. We analyze the interplay between systematic errors in the class probability estimates and cost matrices for multiclass classification. We explore the effect on the class partitioning of five different transformations of the cost matrix. Experiments on benchmark datasets with naive Bayes and quadratic discriminant analysis show the effectiveness of learning a new partition matrix compared to previously proposed methods.