A methodology for comparing classifiers that allow the control of bias

Authors:
Anton Zamolotskikh;Sarah Jane Delany;Pádraig Cunningham
Affiliations:
University of Dublin, Dublin, Ireland;Dublin Institute of Technology, Dublin, Ireland;University of Dublin, Dublin, Ireland
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 5
Cited 1

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Combining text and heuristics for cost-sensitive spam filtering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Computer-assisted pit-pattern classification in different wavelet domains for supporting dignity assessment of colonic polyps

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents False Positive-Critical Classifiers Comparison a new technique for pairwise comparison of classifiers that allow the control of bias. An evaluation of Naïve Bayes, k-Nearest Neighbour and Support Vector Machine classifiers has been carried out on five datasets containing unsolicited and legitimate e-mail messages to confirm the advantage of the technique over Receiver Operating Characteristic curves. The evaluation results suggest that the technique may be useful for choosing the better classifier when the ROC curves do not show comprehensive differences, as well as to prove that the difference between two classifiers is not significant, when ROC suggests that it might be. Spam filtering is a typical application for such a comparison tool, as it requires a classifier to be biased toward negative prediction and to have some upper limit on the rate of false positives. Finally the particular evaluation summary is presented, which confirms that Support Vector Machines out-perform other methods in most cases, while the Naïve Bayes classifier works well in a narrow, but relevant range of false positive rate.