C4.5: programs for machine learning
C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Machine Learning
Machine Learning
Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
About the relationship between ROC curves and Cohen's kappa
Engineering Applications of Artificial Intelligence
Design of Experiments in Computational Intelligence: On the Use of Statistical Inference
HAIS '08 Proceedings of the 3rd international workshop on Hybrid Artificial Intelligence Systems
Developing a bioaerosol detector using hybrid genetic fuzzy systems
Engineering Applications of Artificial Intelligence
Two information-theoretic tools to assess the performance of multi-class classifiers
Pattern Recognition Letters
IEEE Transactions on Evolutionary Computation
Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects
Information Sciences: an International Journal
Hi-index | 0.00 |
The proportion of successful hits, usually referred to as "accuracy", is by far the most dominant meter for measuring classifiers' accuracy. This is despite of the fact that accuracy does not compensate for hits that can be attributed to mere chance. Is it a meaningful flaw in the context of machine learning? Are we using the wrong meter for decades? The results of this study do suggest that the answers to these questions are positive. Cohen's kappa, a meter that does compensate for random hits, was compared with accuracy, using a benchmark of fifteen datasets and five well-known classifiers. It turned out that the average probability of a hit being the result of mere chance exceeded one third (!). It was also found that the proportion of random hits varied with different classifiers that were applied even to a single dataset. Consequently, the rankings of classifiers' accuracy, with and without compensation for random hits, differed from each other in eight out of the fifteen datasets. Therefore, accuracy may well fail in its main task, namely to properly measure the accuracy-wise merits of the classifiers themselves.