Combining instance selection methods based on data characterization: An approach to increase their effectiveness

Authors:
Yoel Caises;Antonio González;Enrique Leyva;Raúl Pérez
Affiliations:
Facultad de Informática y Matemática, Universidad de Holguín, 80100 Holguín, Cuba;Dpto de Ciencias de la Computación e IA, ETSIIT, Universidad de Granada, 18071 Granada, Spain;Facultad de Informática y Matemática, Universidad de Holguín, 80100 Holguín, Cuba;Dpto de Ciencias de la Computación e IA, ETSIIT, Universidad de Granada, 18071 Granada, Spain
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 30
Cited 2

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Editing for the k-nearest neighbors rule by a genetic algorithm

Pattern Recognition Letters - Special issue on genetic algorithms
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Data clustering: a review

ACM Computing Surveys (CSUR)
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
An experimental study about the search mechanism in the SLAVE learning algorithm: hill-climbing methods versus genetic algorithms

Information Sciences: an International Journal - Recent advances in genetic fuzzy systems
Handbook of Evolutionary Computation

Handbook of Evolutionary Computation
Self-Organizing Maps

Self-Organizing Maps
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
A Unifying View on Instance Selection

Data Mining and Knowledge Discovery
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining

IWLCS '01 Revised Papers from the 4th International Workshop on Advances in Learning Classifier Systems
Stratification for scaling up evolutionary prototype selection

Pattern Recognition Letters
Adapting k-means for supervised clustering

Applied Intelligence
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Finding Prototypes For Nearest Neighbor Classifiers

IEEE Transactions on Computers
A memetic algorithm for evolutionary prototype selection: A scaling up approach

Pattern Recognition
Nearest neighbor editing aided by unlabeled data

Information Sciences: an International Journal
SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Data characterization for effective prototype selection

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
SLAVE: a genetic learning system based on an iterative approach

IEEE Transactions on Fuzzy Systems
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
The reduced nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
An algorithm for a selective nearest neighbor decision rule (Corresp.)

IEEE Transactions on Information Theory

Domains of competence of the semi-naive Bayesian network classifiers

Information Sciences: an International Journal
On the use of meta-learning for instance selection: An architecture and an experimental study

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Although there are several proposals in the instance selection field, none of them consistently outperforms the others over a wide range of domains. In recent years many authors have come to the conclusion that data must be characterized in order to apply the most suitable selection criterion in each case. In light of this hypothesis, herein we propose a set of measures to characterize databases. These measures were used in decision rules which, given their values for a database, select from some pre-selected methods, the method, or combination of methods, that is expected to produce the best results. The rules were extracted based on an empirical analysis of the behaviors of several methods on several data sets, then integrated into an algorithm which was experimentally evaluated over 20 databases and with six different learning paradigms. The results were compared with those of five well-known state-of-the-art methods.