C4.5: programs for machine learning
C4.5: programs for machine learning
Data mining
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Handbook of Evolutionary Computation
Handbook of Evolutionary Computation
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
Instance Selection and Construction for Data Mining
Instance Selection and Construction for Data Mining
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms
On Issues of Instance Selection
Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Building Decision Trees with Constraints
Data Mining and Knowledge Discovery
Analysis of new techniques to obtain quality training sets
Pattern Recognition Letters - Special issue: Sibgrapi 2001
The Effects of Training Set Size on Decision Tree Complexity
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Selection of Training Data for Neural Networks by a Genetic Algorithm
PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
Stratification for scaling up evolutionary prototype selection
Pattern Recognition Letters
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Transactions on Evolutionary Computation
The reduced nearest neighbor rule (Corresp.)
IEEE Transactions on Information Theory
An algorithm for a selective nearest neighbor decision rule (Corresp.)
IEEE Transactions on Information Theory
Application of elitist multi-objective genetic algorithm for classification rule generation
Applied Soft Computing
Particle swarm optimization for prototype reduction
Neurocomputing
Computers in Biology and Medicine
Prototype selection algorithms for distributed learning
Pattern Recognition
An A-Team approach to learning classifiers from distributed data sources
International Journal of Intelligent Information and Database Systems
An agent-based simulated annealing algorithm for data reduction
KES-AMSTA'10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part II
Parallel distributed implementation of genetics-based machine learning for fuzzy classifier design
SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Active rule learning using decision tree for resource management in Grid computing
Future Generation Computer Systems
Prototype reduction techniques: A comparison among different approaches
Expert Systems with Applications: An International Journal
Distributed learning with data reduction
Transactions on computational collective intelligence IV
A simulated annealing method based on a specialised evolutionary algorithm
Applied Soft Computing
Data reduction for instance-based learning using entropy-based partitioning
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part III
Automatic categorisation of comments in social news websites
Expert Systems with Applications: An International Journal
Opcode sequences as representation of executables for data-mining-based unknown malware detection
Information Sciences: an International Journal
Hi-index | 0.01 |
In this paper, we present a new approach for training set selection in large size data sets. The algorithm consists on the combination of stratification and evolutionary algorithms. The stratification reduces the size of domain where the selection is applied while the evolutionary method selects the most representative instances. The performance of the proposal is compared with seven non-evolutionary algorithms, in stratified execution. The analysis follows two evaluating approaches: balance between reduction and accuracy of the subsets selected, and balance between interpretability and accuracy of the representation models associated to these subsets. The algorithms have been assessed on large and huge size data sets. The study shows that the stratified evolutionary instance selection consistently outperforms the non-evolutionary ones. The main advantages are: high instance reduction rates, high classification accuracy and models with high interpretability.