Instance-Based Learning Algorithms
Machine Learning
Recursive Automatic Bias Selection for Classifier Construction
Machine Learning - Special issue on bias evaluation and selection
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Nearest neighbor classifier: simultaneous editing and feature selection
Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Multidimensional binary search trees used for associative searching
Communications of the ACM
Adaptive Intrusion Detection: A Data Mining Approach
Artificial Intelligence Review - Issues on the application of data mining
Customer Retention via Data Mining
Artificial Intelligence Review - Issues on the application of data mining
Unsupervised Feature Selection Using Feature Similarity
IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
On Issues of Instance Selection
Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
Feature selection for high-dimensional genomic microarray data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Feature Selection for Clustering - A Filter Solution
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Theoretical and Empirical Analysis of ReliefF and RReliefF
Machine Learning
Learning Ensembles from Bites: A Scalable and Accurate Approach
The Journal of Machine Learning Research
Stratification for scaling up evolutionary prototype selection
Pattern Recognition Letters
Instance Selection and Feature Weighting Using Evolutionary Algorithms
CIC '06 Proceedings of the 15th International Conference on Computing
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Nonlinear Boosting Projections for Ensemble Construction
The Journal of Machine Learning Research
Translation initiation site prediction on a genomic scale
Bioinformatics
A memetic algorithm for evolutionary prototype selection: A scaling up approach
Pattern Recognition
A divide-and-conquer recursive approach for scaling up instance selection algorithms
Data Mining and Knowledge Discovery
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy
Evolutionary Computation
Design of nearest neighbor classifiers: multi-objective approach
International Journal of Approximate Reasoning
A Parameter-Free Classification Method for Large Scale Learning
The Journal of Machine Learning Research
Simultaneous feature selection and classification using kernel-penalized support vector machines
Information Sciences: an International Journal
Information Sciences: an International Journal
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study
IEEE Transactions on Pattern Analysis and Machine Intelligence
An orthogonal genetic algorithm with quantization for globalnumerical optimization
IEEE Transactions on Evolutionary Computation
Performance assessment of multiobjective optimizers: an analysis and review
IEEE Transactions on Evolutionary Computation
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Transactions on Evolutionary Computation
Fast generic selection of features for neural network classifiers
IEEE Transactions on Neural Networks
Hi-index | 0.07 |
An enormous amount of information is continually being produced in current research, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection and text mining, involve large or enormous datasets. These datasets pose serious problems for many data mining algorithms. One method to address very large datasets is data reduction. Among the most useful data reduction methods is simultaneous instance and feature selection. This method achieves a considerable reduction in the training data while maintaining, or even improving, the performance of the data-mining algorithm. However, it suffers from a high degree of scalability problems, even for medium-sized datasets. In this paper, we propose a new evolutionary simultaneous instance and feature selection algorithm that is scalable to millions of instances and thousands of features. This proposal is based on the divide-and-conquer principle combined with bookkeeping. The divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the entire dataset into memory. Using 50 medium-sized datasets, we will demonstrate our method's ability to match the results of state-of-the-art instance and feature selection methods while significantly reducing the time requirements. Using 13 very large datasets, we will demonstrate the scalability of our proposal to millions of instances and thousands of features.