On the use of ROC analysis for the optimization of abstaining classifiers

Authors:
Tadeusz Pietraszek
Affiliations:
IBM Zurich Research Laboratory, Rüschlikon, Switzerland 8803
Venue:
Machine Learning
Year:
2007

Citing 13
Cited 10

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
The base-rate fallacy and its implications for the difficulty of intrusion detection

CCS '99 Proceedings of the 6th ACM conference on Computer and communications security
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Robust Classification for Imprecise Environments

Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Reducing the classification cost of support vector classifiers through an ROC-based reject rule

Pattern Analysis & Applications
Delegating classifiers

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Properties and benefits of calibrated classifiers

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Optimizing abstaining classifiers using ROC analysis

ICML '05 Proceedings of the 22nd international conference on Machine learning
Multi-Stage Classification

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Classification of intrusion detection alerts using abstaining classifiers

Intelligent Data Analysis
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Repairing concavities in ROC curves

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Classification of intrusion detection alerts using abstaining classifiers

Intelligent Data Analysis
The ROC isometrics approach to construct reliable classifiers

Intelligent Data Analysis
Mining for the most certain predictions from dyadic data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Adapting cost-sensitive learning for reject option

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Two stage reject rule for ECOC classification systems

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Shaping the error-reject curve of error correcting output coding systems

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing: Part I
Reliable agnostic learning

Journal of Computer and System Sciences
Design of reject rules for ECOC classification systems

Pattern Recognition
Multi-label classification with a reject option

Pattern Recognition
VILO: a rapid learning nearest-neighbor classifier for malware triage

Journal in Computer Virology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classifiers that refrain from classification in certain cases can significantly reduce the misclassification cost. However, the parameters for such abstaining classifiers are often set in a rather ad-hoc manner. We propose a method to optimally build a specific type of abstaining binary classifiers using ROC analysis. These classifiers are built based on optimization criteria in the following three models: cost-based, bounded-abstention and bounded-improvement. We show that selecting the optimal classifier in the first model is similar to known iso-performance lines and uses only the slopes of ROC curves, whereas selecting the optimal classifier in the remaining two models is not straightforward. We investigate the properties of the convex-down ROCCH (ROC Convex Hull) and present a simple and efficient algorithm for finding the optimal classifier in these models, namely, the bounded-abstention and bounded-improvement models. We demonstrate the application of these models to effectively reduce misclassification cost in real-life classification systems. The method has been validated with an ROC building algorithm and cross-validation on 15 UCI KDD datasets.