Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts

Authors:
César García-Osorio;Aida de Haro-García;Nicolás García-Pedrajas
Affiliations:
Department of Civil Engineering of the University of Burgos, Spain;Department of Computing and Numerical Analysis of the University of Córdoba, Spain;Department of Computing and Numerical Analysis of the University of Córdoba, Spain
Venue:
Artificial Intelligence
Year:
2010

Citing 29
Cited 9

The grand tour: a tool for viewing multidimensional data

SIAM Journal on Scientific and Statistical Computing
Grand tour methods: an outline

Proceedings of the Seventeenth Symposium on the interface of computer sciences and statistics on Computer science and statistics
The GENITOR algorithm and selection pressure: why rank-based allocation of reproductive trials is best

Proceedings of the third international conference on Genetic algorithms
C4.5: programs for machine learning

C4.5: programs for machine learning
Editing for the k-nearest neighbors rule by a genetic algorithm

Pattern Recognition Letters - Special issue on genetic algorithms
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
On Issues of Instance Selection

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Mining complex models from arbitrarily large databases in constant time

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning

Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning
A selective sampling approach to active feature selection

Artificial Intelligence
Stratification for scaling up evolutionary prototype selection

Pattern Recognition Letters
Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability

Data & Knowledge Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Nonlinear Boosting Projections for Ensemble Construction

The Journal of Machine Learning Research
A novel Supervised Instance Selection algorithm

International Journal of Business Intelligence and Data Mining
A divide-and-conquer recursive approach for scaling up instance selection algorithms

Data Mining and Knowledge Discovery
Constructing ensembles of classifiers by means of weighted instance selection

IEEE Transactions on Neural Networks
A cooperative coevolutionary algorithm for instance selection for instance-based learning

Machine Learning
Letter: Regaining sparsity in kernel principal components

Neurocomputing
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
Enhancing prototype reduction schemes with recursion: a method applicable for "large" data sets

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
The reduced nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory

Large scale instance selection by means of a parallel algorithm

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Scaling up feature selection by means of democratization

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
A comparison of two strategies for scaling up instance selection in huge datasets

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Instance selection for class imbalanced problems by means of selecting instances more than once

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Editorial: Large scale instance selection by means of federal instance selection

Data & Knowledge Engineering
Integrating a differential evolution feature weighting scheme into prototype generation

Neurocomputing
Multi-selection of instances: A straightforward way to improve evolutionary instance selection

Applied Soft Computing
InstanceRank based on borders for instance selection

Pattern Recognition
A scalable approach to simultaneous evolutionary instance and feature selection

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly being produced in many fields of research. Although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is in the hundreds of thousands or millions. When we face huge problems, scalability becomes an issue, and most algorithms are not applicable. Thus, paradoxically, instance selection algorithms are for the most part impracticable for the same problems that would benefit most from their use. This paper presents a way of avoiding this difficulty using several rounds of instance selection on subsets of the original dataset. These rounds are combined using a voting scheme to allow good performance in terms of testing error and storage reduction, while the execution time of the process is significantly reduced. The method is particularly efficient when we use instance selection algorithms that are high in computational cost. The proposed approach shares the philosophy underlying the construction of ensembles of classifiers. In an ensemble, several weak learners are combined to form a strong classifier; in our method several weak (in the sense that they are applied to subsets of the data) instance selection algorithms are combined to produce a strong and fast instance selection method. An extensive comparison of 30 medium and large datasets from the UCI Machine Learning Repository using 3 different classifiers shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets (from three hundred thousand to more than a million instances) with good results and fast execution time.