Machine learning, neural and statistical classification
Quantifying the Resilience of Inductive Classification Algorithms
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Genetic-Based Synthetic Data Sets for the Analysis of Classifiers Behavior
HIS '08 Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems
Metalearning: Applications to Data Mining
Metalearning: Applications to Data Mining
Combining meta-learning and active selection of datasetoids for algorithm selection
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Uncertainty sampling-based active selection of datasetoids for meta-learning
ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Combining Uncertainty Sampling methods for supporting the generation of meta-examples
Information Sciences: an International Journal
Identifying characteristics of seaports for environmental benchmarks based on meta-learning
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Hi-index | 0.00 |
As companies employ a larger number of models, the problem of algorithm (and parameter) selection is becoming increasingly important. Two approaches to obtain empirical knowledge that is useful for that purpose are empirical studies and metalearning. However, most empirical (meta)knowledge is obtained from a relatively small set of datasets. In this paper, we propose a method to obtain a large number of datasets which is based on a simple transformation of existing datasets, referred to as datasetoids . We test our approach on the problem of using metalearning to predict when to prune decision trees. The results show significant improvement when using datasetoids. Additionally, we identify a number of potential anomalies in the generated datasetoids and propose methods to solve them.