Combining instance selection methods based on data characterization: An approach to increase their effectiveness

  • Authors:
  • Yoel Caises;Antonio González;Enrique Leyva;Raúl Pérez

  • Affiliations:
  • Facultad de Informática y Matemática, Universidad de Holguín, 80100 Holguín, Cuba;Dpto de Ciencias de la Computación e IA, ETSIIT, Universidad de Granada, 18071 Granada, Spain;Facultad de Informática y Matemática, Universidad de Holguín, 80100 Holguín, Cuba;Dpto de Ciencias de la Computación e IA, ETSIIT, Universidad de Granada, 18071 Granada, Spain

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.07

Visualization

Abstract

Although there are several proposals in the instance selection field, none of them consistently outperforms the others over a wide range of domains. In recent years many authors have come to the conclusion that data must be characterized in order to apply the most suitable selection criterion in each case. In light of this hypothesis, herein we propose a set of measures to characterize databases. These measures were used in decision rules which, given their values for a database, select from some pre-selected methods, the method, or combination of methods, that is expected to produce the best results. The rules were extracted based on an empirical analysis of the behaviors of several methods on several data sets, then integrated into an algorithm which was experimentally evaluated over 20 databases and with six different learning paradigms. The results were compared with those of five well-known state-of-the-art methods.