Selecting classification algorithms with active testing

Authors:
Rui Leite;Pavel Brazdil;Joaquin Vanschoren
Affiliations:
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Portugal;LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Portugal;LIACS - Leiden Institute of Advanced Computer Science, University of Leiden, Nederlands
Venue:
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2012

Citing 15
Cited 2

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Machine Learning

Machine Learning
A perspective view and survey of meta-learning

Artificial Intelligence Review
Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results

Machine Learning
Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms Before Choosing

EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Meta-Learning by Landmarking Various Learning Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Predicting relative performance of classifiers from samples

ICML '05 Proceedings of the 22nd international conference on Machine learning
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Cross-disciplinary perspectives on meta-learning for algorithm selection

ACM Computing Surveys (CSUR)
A Community-Based Platform for Machine Learning Experimentation

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Active learning for ranking through expected loss optimization

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Active Testing Strategy to Predict the Best Classification Algorithm via Sampling and Metalearning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Experiment databases: a novel methodology for experimental research

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases

Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Pairwise meta-rules for better meta-learning-based algorithm ranking

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This ‘most promising' competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods.