ROC 'n' rule learning: towards a better understanding of covering algorithms

Authors:
Johannes Fürnkranz;Peter A. Flach
Affiliations:
TU Darmstadt, Knowledge Engineering Group, Hochschulstraße 10, D-64289 Darmstadt, Germany;Department of Computer Science, University of Bristol, Woodland Road, Bristol BS8 1UB, UK
Venue:
Machine Learning
Year:
2005

Citing 22
Cited 0

Boolean Feature Discovery in Empirical Learning

Machine Learning
Rule induction with CN2: some recent improvements

EWSL-91 Proceedings of the European working session on learning on Machine learning
Learning structured concepts using genetic algorithms

ML92 Proceedings of the ninth international workshop on Machine learning
FOSSIL: a robust relational learner

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
ROC curves and the X2 test

Pattern Recognition Letters
Pruning Algorithms for Rule Learning

Machine Learning
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
A simple, fast, and effective rule learner

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
An Adjustable Description Quality Measure for Pattern Discovery Usingthe AQ Methodology

Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Robust Classification for Imprecise Environments

Machine Learning
Information Retrieval

Information Retrieval
Learning Logical Definitions from Relations

Machine Learning
The CN2 Induction Algorithm

Machine Learning
Induction of Decision Trees

Machine Learning
Learning Decision Trees Using the Area Under the ROC Curve

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Quantification of Distance Bias Between Evaluation Metrics In Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Inductive Constraint Logic

ALT '95 Proceedings of the 6th International Conference on Algorithmic Learning Theory
Round robin classification

The Journal of Machine Learning Research
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research
Expert-guided subgroup discovery: methodology and application

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in coverage space, a variant of ROC space. Our results show that most commonly used metrics, including accuracy, weighted relative accuracy, entropy, and Gini index, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes. Furthermore, our results show that stopping and filtering criteria like CN2's significance test focus on identifying significant deviations from random classification, which does not necessarily avoid overfitting. We also identify a problem with Foil's MDL-based encoding length restriction, which proves to be largely equivalent to a variable threshold on the recall of the rule. In general, we interpret these results as evidence that, contrary to common conception, pre-pruning heuristics are not very well understood and deserve more investigation.