The grand tour: a tool for viewing multidimensional data
SIAM Journal on Scientific and Statistical Computing
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
Supporting internet-scale multi-agent systems
Data & Knowledge Engineering - DKE 40
A selective sampling approach to active feature selection
Artificial Intelligence
Stratification for scaling up evolutionary prototype selection
Pattern Recognition Letters
Online clustering of parallel data streams
Data & Knowledge Engineering
pPOP: Fast yet accurate parallel hierarchical clustering using partitioning
Data & Knowledge Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
Fast Nearest Neighbor Condensation for Large Data Sets Classification
IEEE Transactions on Knowledge and Data Engineering
A memetic algorithm for evolutionary prototype selection: A scaling up approach
Pattern Recognition
Evolutionary rule-based systems for imbalanced data sets
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy
Evolutionary Computation
Constructing ensembles of classifiers by means of weighted instance selection
IEEE Transactions on Neural Networks
Prototype selection algorithms for distributed learning
Pattern Recognition
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Transactions on Evolutionary Computation
The reduced nearest neighbor rule (Corresp.)
IEEE Transactions on Information Theory
On instance selection in audio based emotion recognition
ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Hi-index | 0.00 |
Instance selection is becoming more and more relevant due to the huge amount of data that is constantly being produced. However, although current algorithms are useful for fairly large datasets, many scaling problems are found when the number of instances is hundreds of thousands or millions. Most of the widely used instance selection algorithms are of complexity at least O(n^2), n being the number of instances. When we face very large problems, the scalability becomes an issue, and most of the algorithms are not applicable. This paper presents a methodology for scaling up instance selection algorithms by means of a parallel procedure that performs instance selection on small subsets of the original dataset. The results obtained with the application of instance selection to small subsets are combined using a voting scheme. The method achieves a very good performance in terms of testing error and storage reduction, while the execution time of the process is decreased very significantly. The parallel algorithm also removes any kind of constraint imposed by memory size, as the whole dataset does not need to be stored in memory. The usefulness of our method is shown by an extensive comparison using 35 datasets of medium and large sizes from the UCI Machine Learning Repository. Additionally, our method is applied to eight very large datasets with very good results and fast execution time.