Data mining in metric space: an empirical analysis of supervised learning performance criteria

Authors:
Rich Caruana;Alexandru Niculescu-Mizil
Affiliations:
Cornell University;Cornell University
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 1
Cited 45

Tree Induction for Probability-Based Ranking

Machine Learning

A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
An empirical comparison of supervised learning algorithms

ICML '06 Proceedings of the 23rd international conference on Machine learning
Adaptive communal detection in search of adversarial identity crime

Proceedings of the 2007 international workshop on Domain driven data mining
A Multi-criteria Convex Quadratic Programming model for credit data analysis

Decision Support Systems
A boosting algorithm for learning bipartite ranking functions with partially labeled data

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Classifier Loss Under Metric Uncertainty

ECML '07 Proceedings of the 18th European conference on Machine Learning
Hinge Rank Loss and the Area Under the ROC Curve

ECML '07 Proceedings of the 18th European conference on Machine Learning
Proper Model Selection with Significance Test

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
A Visualization-Based Exploratory Technique for Classifier Comparison with Respect to Multiple Metrics and Multiple Domains

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

Proceedings of the 17th ACM conference on Information and knowledge management
An experimental comparison of performance measures for classification

Pattern Recognition Letters
Quantifying the impact of learning algorithm parameter tuning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Constructing new and better evaluation measures for machine learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Probabilistic classification vector machines

IEEE Transactions on Neural Networks
Aggregating performance metrics for classifier evaluation

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Differentiating between individual class performance in genetic programming fitness for classification with unbalanced data

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Similarity-binning averaging: a generalisation of binning calibration

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Evaluating learning algorithms and classifiers

International Journal of Intelligent Information and Database Systems
Learning patient-specific predictive models from clinical data

Journal of Biomedical Informatics
Memetic Pareto Evolutionary Artificial Neural Networks to determine growth/no-growth in predictive microbiology

Applied Soft Computing
Customer Validation of Commercial Predictive Models

Proceedings of the 2010 conference on Data Mining for Business Applications
A data mining framework for detecting subscription fraud in telecommunication

Engineering Applications of Artificial Intelligence
Multi-objective evolutionary algorithms for feature selection: application in bankruptcy prediction

SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Learning Instance-Specific Predictive Models

The Journal of Machine Learning Research
Memetic evolutionary multi-objective neural network classifier to predict graft survival in liver transplant patients

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
AI based supervised classifiers: an analysis for intrusion detection

ACAI '11 Proceedings of the International Conference on Advances in Computing and Artificial Intelligence
Out of sight, not out of mind: on the effect of social and physical detachment on information need

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Memetic pareto differential evolutionary neural network for donor-recipient matching in liver transplantation

IWANN'11 Proceedings of the 11th international conference on Artificial neural networks conference on Advances in computational intelligence - Volume Part II
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Knowledge-Based Systems
Memetic Elitist Pareto Differential Evolution algorithm based Radial Basis Function Networks for classification problems

Applied Soft Computing
Statistical-based abbreviation expansion

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Temporal representation in spike detection of sparse personal identity streams

WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics
Hybrid multi-objective machine learning classification in liver transplantation

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
A GA-Based wrapper feature selection for animal breeding data mining

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Review: Supervised classification and mathematical optimization

Computers and Operations Research
The Effect of Social and Physical Detachment on Information Need

ACM Transactions on Information Systems (TOIS)
The use of artificial-intelligence-based ensembles for intrusion detection: a review

Applied Computational Intelligence and Soft Computing
Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks

Artificial Intelligence in Medicine
Efficacious end user measures part 1: relative class size and end user problem domains

Advances in Artificial Intelligence - Special issue on Artificial Intelligence Applications in Biomedicine
Predictive model performance: offline and online evaluations

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications

The Journal of Machine Learning Research
On the effect of calibration in classifier combination

Applied Intelligence
Assessing scorecard performance: A literature review and classification

Expert Systems with Applications: An International Journal
Capturing the essence of word-of-mouth for social commerce: Assessing the quality of online e-commerce reviews by a semi-supervised approach

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many criteria can be used to evaluate the performance of supervised learning. Different criteria are appropriate in different settings, and it is not always clear which criteria to use. A further complication is that learning methods that perform well on one criterion may not perform well on other criteria. For example, SVMs and boosting are designed to optimize accuracy, whereas neural nets typically optimize squared error or cross entropy. We conducted an empirical study using a variety of learning methods (SVMs, neural nets, k-nearest neighbor, bagged and boosted trees, and boosted stumps) to compare nine boolean classification performance metrics: Accuracy, Lift, F-Score, Area under the ROC Curve, Average Precision, Precision/Recall Break-Even Point, Squared Error, Cross Entropy, and Probability Calibration. Multidimensional scaling (MDS) shows that these metrics span a low dimensional manifold. The three metrics that are appropriate when predictions are interpreted as probabilities: squared error, cross entropy, and calibration, lay in one part of metric space far away from metrics that depend on the relative order of the predicted values: ROC area, average precision, break-even point, and lift. In between them fall two metrics that depend on comparing predictions to a threshold: accuracy and F-score. As expected, maximum margin methods such as SVMs and boosted trees have excellent performance on metrics like accuracy, but perform poorly on probability metrics such as squared error. What was not expected was that the margin methods have excellent performance on ordering metrics such as ROC area and average precision. We introduce a new metric, SAR, that combines squared error, accuracy, and ROC area into one metric. MDS and correlation analysis shows that SAR is centrally located and correlates well with other metrics, suggesting that it is a good general purpose metric to use when more specific criteria are not known.