On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining

  • Authors:
  • José Ramón Cano;Francisco Herrera;Manuel Lozano

  • Affiliations:
  • Department of Computer Science, University of Jaén, 23071 Jaén, Spain;Department of Computer Science and Artificial Intelligence, ETS de Ingeniera Informatica, University of Granada, 18071 Granada, Spain;Department of Computer Science and Artificial Intelligence, ETS de Ingeniera Informatica, University of Granada, 18071 Granada, Spain

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we present a new approach for training set selection in large size data sets. The algorithm consists on the combination of stratification and evolutionary algorithms. The stratification reduces the size of domain where the selection is applied while the evolutionary method selects the most representative instances. The performance of the proposal is compared with seven non-evolutionary algorithms, in stratified execution. The analysis follows two evaluating approaches: balance between reduction and accuracy of the subsets selected, and balance between interpretability and accuracy of the representation models associated to these subsets. The algorithms have been assessed on large and huge size data sets. The study shows that the stratified evolutionary instance selection consistently outperforms the non-evolutionary ones. The main advantages are: high instance reduction rates, high classification accuracy and models with high interpretability.