Generalizing from case studies: a case study
ML92 Proceedings of the ninth international workshop on Machine learning
Characterizing the applicability of classification algorithms using meta-level learning
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating the Predictive Accuracy of a Classifier
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Efficiently Determining the Starting Sample Size for Progressive Sampling
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Meta-Learning by Landmarking Various Learning Algorithms
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Casa Batlo is in Passeig de Gracia or landmarking the expertise space
Casa Batlo is in Passeig de Gracia or landmarking the expertise space
Active Testing Strategy to Predict the Best Classification Algorithm via Sampling and Metalearning
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Selecting classification algorithms with active testing
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.00 |
When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures.