On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining

Authors:
José Ramón Cano;Francisco Herrera;Manuel Lozano
Affiliations:
Department of Computer Science, University of Jaén, 23071 Jaén, Spain;Department of Computer Science and Artificial Intelligence, ETS de Ingeniera Informatica, University of Granada, 18071 Granada, Spain;Department of Computer Science and Artificial Intelligence, ETS de Ingeniera Informatica, University of Granada, 18071 Granada, Spain
Venue:
Applied Soft Computing
Year:
2006

Citing 20
Cited 15

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining

Data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Handbook of Evolutionary Computation

Handbook of Evolutionary Computation
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Instance Selection and Construction for Data Mining

Instance Selection and Construction for Data Mining
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
On Issues of Instance Selection

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Building Decision Trees with Constraints

Data Mining and Knowledge Discovery
Analysis of new techniques to obtain quality training sets

Pattern Recognition Letters - Special issue: Sibgrapi 2001
The Effects of Training Set Size on Decision Tree Complexity

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Selection of Training Data for Neural Networks by a Genetic Algorithm

PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
Stratification for scaling up evolutionary prototype selection

Pattern Recognition Letters
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
The reduced nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
An algorithm for a selective nearest neighbor decision rule (Corresp.)

IEEE Transactions on Information Theory

Application of elitist multi-objective genetic algorithm for classification rule generation

Applied Soft Computing
Particle swarm optimization for prototype reduction

Neurocomputing
Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification

Computers in Biology and Medicine
Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems

Applied Soft Computing
Prototype selection algorithms for distributed learning

Pattern Recognition
An A-Team approach to learning classifiers from distributed data sources

International Journal of Intelligent Information and Database Systems
An agent-based simulated annealing algorithm for data reduction

KES-AMSTA'10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part II
Parallel distributed implementation of genetics-based machine learning for fuzzy classifier design

SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Active rule learning using decision tree for resource management in Grid computing

Future Generation Computer Systems
Prototype reduction techniques: A comparison among different approaches

Expert Systems with Applications: An International Journal
Distributed learning with data reduction

Transactions on computational collective intelligence IV
A simulated annealing method based on a specialised evolutionary algorithm

Applied Soft Computing
Data reduction for instance-based learning using entropy-based partitioning

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part III
Automatic categorisation of comments in social news websites

Expert Systems with Applications: An International Journal
Opcode sequences as representation of executables for data-mining-based unknown malware detection

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we present a new approach for training set selection in large size data sets. The algorithm consists on the combination of stratification and evolutionary algorithms. The stratification reduces the size of domain where the selection is applied while the evolutionary method selects the most representative instances. The performance of the proposal is compared with seven non-evolutionary algorithms, in stratified execution. The analysis follows two evaluating approaches: balance between reduction and accuracy of the subsets selected, and balance between interpretability and accuracy of the representation models associated to these subsets. The algorithms have been assessed on large and huge size data sets. The study shows that the stratified evolutionary instance selection consistently outperforms the non-evolutionary ones. The main advantages are: high instance reduction rates, high classification accuracy and models with high interpretability.