C4.5: programs for machine learning
C4.5: programs for machine learning
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
Wrappers for performance enhancement and oblivious decision graphs
Wrappers for performance enhancement and oblivious decision graphs
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Learning Logical Definitions from Relations
Machine Learning
Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
A Hybrid Technique for Data Mining on Balance-Sheet Data
DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Overfitting in making comparisons between variable selection methods
The Journal of Machine Learning Research
Combining inductive and deductive tools for data analysis
AI Communications
Improving inductive logic programming by using simulated annealing
Information Sciences: an International Journal
Structural Risk Minimisation based gene expression profiling analysis
International Journal of Bioinformatics Research and Applications
Extension of the Top-Down Data-Driven Strategy to ILP
Inductive Logic Programming
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Hi-index | 0.01 |
In order to rank the performance of machine learning algorithms, many researchers conduct experiments on benchmark data sets. Since most learning algorithms have domain-specific parameters, it is a popular custom to adapt these parameters to obtain a minimal error rate on the test set. The same rate is then used to rank the algorithm, which causes an optimistic bias. We quantify this bias, showing, in particular, that an algorithm with more parameters will probably be ranked higher than an equally good algorithm with fewer parameters. We demonstrate this result, showing the number of parameters and trials required in order to pretend to outperform C4.5 or FOIL, respectively, for various benchmark problems. We then describe out how unbiased ranking experiments should be conducted.