Unbiased assessment of learning algorithms

Authors:
Tobias Scheffer;Ralf Herbrich
Affiliations:
Technische Universitat Berlin, Artificial Intelligence Group, Berlin, Germany;Technische Universitat Berlin, Artificial Intelligence Group, Berlin, Germany
Venue:
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Year:
1997

Citing 5
Cited 9

C4.5: programs for machine learning

C4.5: programs for machine learning
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Wrappers for performance enhancement and oblivious decision graphs

Wrappers for performance enhancement and oblivious decision graphs
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Learning Logical Definitions from Relations

Machine Learning

Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
A Hybrid Technique for Data Mining on Balance-Sheet Data

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Overfitting in making comparisons between variable selection methods

The Journal of Machine Learning Research
Combining inductive and deductive tools for data analysis

AI Communications
Fusing multimodal biometrics with quality estimates via a Bayesian belief network

Pattern Recognition
Improving inductive logic programming by using simulated annealing

Information Sciences: an International Journal
Structural Risk Minimisation based gene expression profiling analysis

International Journal of Bioinformatics Research and Applications
Extension of the Top-Down Data-Driven Strategy to ILP

Inductive Logic Programming
Block-wise construction of acyclic relational features with monotone irreducibility and relevancy properties

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

In order to rank the performance of machine learning algorithms, many researchers conduct experiments on benchmark data sets. Since most learning algorithms have domain-specific parameters, it is a popular custom to adapt these parameters to obtain a minimal error rate on the test set. The same rate is then used to rank the algorithm, which causes an optimistic bias. We quantify this bias, showing, in particular, that an algorithm with more parameters will probably be ranked higher than an equally good algorithm with fewer parameters. We demonstrate this result, showing the number of parameters and trials required in order to pretend to outperform C4.5 or FOIL, respectively, for various benchmark problems. We then describe out how unbiased ranking experiments should be conducted.