Tree Induction for Probability-Based Ranking
Machine Learning
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
An empirical comparison of supervised learning algorithms
ICML '06 Proceedings of the 23rd international conference on Machine learning
Adaptive communal detection in search of adversarial identity crime
Proceedings of the 2007 international workshop on Domain driven data mining
A Multi-criteria Convex Quadratic Programming model for credit data analysis
Decision Support Systems
A boosting algorithm for learning bipartite ranking functions with partially labeled data
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Classifier Loss Under Metric Uncertainty
ECML '07 Proceedings of the 18th European conference on Machine Learning
Hinge Rank Loss and the Area Under the ROC Curve
ECML '07 Proceedings of the 18th European conference on Machine Learning
Proper Model Selection with Significance Test
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization
Proceedings of the 17th ACM conference on Information and knowledge management
An experimental comparison of performance measures for classification
Pattern Recognition Letters
Quantifying the impact of learning algorithm parameter tuning
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Constructing new and better evaluation measures for machine learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Probabilistic classification vector machines
IEEE Transactions on Neural Networks
Aggregating performance metrics for classifier evaluation
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Similarity-binning averaging: a generalisation of binning calibration
IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Evaluating learning algorithms and classifiers
International Journal of Intelligent Information and Database Systems
Learning patient-specific predictive models from clinical data
Journal of Biomedical Informatics
Customer Validation of Commercial Predictive Models
Proceedings of the 2010 conference on Data Mining for Business Applications
A data mining framework for detecting subscription fraud in telecommunication
Engineering Applications of Artificial Intelligence
Multi-objective evolutionary algorithms for feature selection: application in bankruptcy prediction
SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Learning Instance-Specific Predictive Models
The Journal of Machine Learning Research
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
AI based supervised classifiers: an analysis for intrusion detection
ACAI '11 Proceedings of the International Conference on Advances in Computing and Artificial Intelligence
Out of sight, not out of mind: on the effect of social and physical detachment on information need
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
IWANN'11 Proceedings of the 11th international conference on Artificial neural networks conference on Advances in computational intelligence - Volume Part II
Statistical-based abbreviation expansion
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Temporal representation in spike detection of sparse personal identity streams
WISI'06 Proceedings of the 2006 international conference on Intelligence and Security Informatics
Hybrid multi-objective machine learning classification in liver transplantation
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
A GA-Based wrapper feature selection for animal breeding data mining
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Review: Supervised classification and mathematical optimization
Computers and Operations Research
The Effect of Social and Physical Detachment on Information Need
ACM Transactions on Information Systems (TOIS)
The use of artificial-intelligence-based ensembles for intrusion detection: a review
Applied Computational Intelligence and Soft Computing
Artificial Intelligence in Medicine
Efficacious end user measures part 1: relative class size and end user problem domains
Advances in Artificial Intelligence - Special issue on Artificial Intelligence Applications in Biomedicine
Predictive model performance: offline and online evaluations
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
On the effect of calibration in classifier combination
Applied Intelligence
Assessing scorecard performance: A literature review and classification
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Many criteria can be used to evaluate the performance of supervised learning. Different criteria are appropriate in different settings, and it is not always clear which criteria to use. A further complication is that learning methods that perform well on one criterion may not perform well on other criteria. For example, SVMs and boosting are designed to optimize accuracy, whereas neural nets typically optimize squared error or cross entropy. We conducted an empirical study using a variety of learning methods (SVMs, neural nets, k-nearest neighbor, bagged and boosted trees, and boosted stumps) to compare nine boolean classification performance metrics: Accuracy, Lift, F-Score, Area under the ROC Curve, Average Precision, Precision/Recall Break-Even Point, Squared Error, Cross Entropy, and Probability Calibration. Multidimensional scaling (MDS) shows that these metrics span a low dimensional manifold. The three metrics that are appropriate when predictions are interpreted as probabilities: squared error, cross entropy, and calibration, lay in one part of metric space far away from metrics that depend on the relative order of the predicted values: ROC area, average precision, break-even point, and lift. In between them fall two metrics that depend on comparing predictions to a threshold: accuracy and F-score. As expected, maximum margin methods such as SVMs and boosted trees have excellent performance on metrics like accuracy, but perform poorly on probability metrics such as squared error. What was not expected was that the margin methods have excellent performance on ordering metrics such as ROC area and average precision. We introduce a new metric, SAR, that combines squared error, accuracy, and ROC area into one metric. MDS and correlation analysis shows that SAR is centrally located and correlates well with other metrics, suggesting that it is a good general purpose metric to use when more specific criteria are not known.