Proceedings of the third international conference on Genetic algorithms
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Elements of information theory
Elements of information theory
The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Genetic algorithms + data structures = evolution programs (2nd, extended ed.)
Genetic algorithms + data structures = evolution programs (2nd, extended ed.)
Editing for the k-nearest neighbors rule by a genetic algorithm
Pattern Recognition Letters - Special issue on genetic algorithms
Recursive Automatic Bias Selection for Classifier Construction
Machine Learning - Special issue on bias evaluation and selection
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
On Issues of Instance Selection
Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning
Learning Ensembles from Bites: A Scalable and Accurate Approach
The Journal of Machine Learning Research
Stratification for scaling up evolutionary prototype selection
Pattern Recognition Letters
Scalable Representative Instance Selection and Ranking
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Remembering to forget: a competence-preserving case deletion policy for case-based reasoning systems
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Design of nearest neighbor classifiers: multi-objective approach
International Journal of Approximate Reasoning
Ensembles of classifiers from spatially disjoint data
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Data reduction for instance-based learning using entropy-based partitioning
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part III
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Transactions on Evolutionary Computation
A review of instance selection methods
Artificial Intelligence Review
Instance selection for class imbalanced problems by means of selecting instances more than once
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
InstanceRank based on borders for instance selection
Pattern Recognition
A scalable approach to simultaneous evolutionary instance and feature selection
Information Sciences: an International Journal
Hi-index | 0.00 |
Instance selection is becoming more and more relevant due to the huge amount of data that is being constantly produced. However, although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is of hundreds of thousands or millions. In the best case, these algorithms are of efficiency O(n 2), n being the number of instances. When we face huge problems, scalability is an issue, and most algorithms are not applicable. This paper presents a divide-and-conquer recursive approach to the problem of instance selection for instance based learning for very large problems. Our method divides the original training set into small subsets where the instance selection algorithm is applied. Then the selected instances are rejoined in a new training set and the same procedure, partitioning and application of an instance selection algorithm, is repeated. In this way, our approach is based on the philosophy of divide-and-conquer applied in a recursive manner. The proposed method is able to match, and even improve, for the case of storage reduction, the results of well-known standard algorithms with a very significant reduction of execution time. An extensive comparison in 30 datasets form the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets with from 300,000 to more than a million instances, with very good results and fast execution time.