The grand tour: a tool for viewing multidimensional data
SIAM Journal on Scientific and Statistical Computing
Grand tour methods: an outline
Proceedings of the Seventeenth Symposium on the interface of computer sciences and statistics on Computer science and statistics
Proceedings of the third international conference on Genetic algorithms
C4.5: programs for machine learning
C4.5: programs for machine learning
Editing for the k-nearest neighbors rule by a genetic algorithm
Pattern Recognition Letters - Special issue on genetic algorithms
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
On Issues of Instance Selection
Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Mining complex models from arbitrarily large databases in constant time
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning
A selective sampling approach to active feature selection
Artificial Intelligence
Stratification for scaling up evolutionary prototype selection
Pattern Recognition Letters
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Nonlinear Boosting Projections for Ensemble Construction
The Journal of Machine Learning Research
A novel Supervised Instance Selection algorithm
International Journal of Business Intelligence and Data Mining
A divide-and-conquer recursive approach for scaling up instance selection algorithms
Data Mining and Knowledge Discovery
Constructing ensembles of classifiers by means of weighted instance selection
IEEE Transactions on Neural Networks
Letter: Regaining sparsity in kernel principal components
Neurocomputing
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Transactions on Evolutionary Computation
Enhancing prototype reduction schemes with recursion: a method applicable for "large" data sets
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
The reduced nearest neighbor rule (Corresp.)
IEEE Transactions on Information Theory
Large scale instance selection by means of a parallel algorithm
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Scaling up feature selection by means of democratization
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
A comparison of two strategies for scaling up instance selection in huge datasets
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Instance selection for class imbalanced problems by means of selecting instances more than once
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Editorial: Large scale instance selection by means of federal instance selection
Data & Knowledge Engineering
InstanceRank based on borders for instance selection
Pattern Recognition
A scalable approach to simultaneous evolutionary instance and feature selection
Information Sciences: an International Journal
Hi-index | 0.00 |
Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly being produced in many fields of research. Although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is in the hundreds of thousands or millions. When we face huge problems, scalability becomes an issue, and most algorithms are not applicable. Thus, paradoxically, instance selection algorithms are for the most part impracticable for the same problems that would benefit most from their use. This paper presents a way of avoiding this difficulty using several rounds of instance selection on subsets of the original dataset. These rounds are combined using a voting scheme to allow good performance in terms of testing error and storage reduction, while the execution time of the process is significantly reduced. The method is particularly efficient when we use instance selection algorithms that are high in computational cost. The proposed approach shares the philosophy underlying the construction of ensembles of classifiers. In an ensemble, several weak learners are combined to form a strong classifier; in our method several weak (in the sense that they are applied to subsets of the data) instance selection algorithms are combined to produce a strong and fast instance selection method. An extensive comparison of 30 medium and large datasets from the UCI Machine Learning Repository using 3 different classifiers shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets (from three hundred thousand to more than a million instances) with good results and fast execution time.