On the k-NN performance in a challenging scenario of imbalance and overlapping

Authors:
V. García;R. A. Mollineda;J. S. Sánchez
Affiliations:
Instituto Tecnológico de Toluca. Av. Tecnológico s/n, Laboratorio de Reconocimiento de Patrones, 52140, Metepec, México;Universitat Jaume I. Av. Vicent Sos Baynat s/n, Departament de Llenguatges i Sistemes Informàtics, 12071, Castelló, Spain;Universitat Jaume I. Av. Vicent Sos Baynat s/n, Departament de Llenguatges i Sistemes Informàtics, 12071, Castelló, Spain
Venue:
Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
Year:
2008

Citing 0
Cited 13

On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets

Pattern Recognition Letters
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

IEEE Transactions on Evolutionary Computation
Addressing the classification with imbalanced data: open problems and new challenges on class distribution

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Back propagation with balanced MSE cost function and nearest neighbor editing for handling class overlap and class imbalance

IWANN'11 Proceedings of the 11th international conference on Artificial neural networks conference on Advances in computational intelligence - Volume Part I
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Knowledge-Based Systems
Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics

Expert Systems with Applications: An International Journal
A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets

Knowledge-Based Systems
A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios

Pattern Recognition Letters
EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Pattern Recognition
Class imbalance and the curse of minority hubs

Knowledge-Based Systems
Addressing imbalanced classification with instance generation techniques: IPADE-ID

Neurocomputing
Cost-sensitive decision tree ensembles for effective imbalanced classification

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A two-class data set is said to be imbalanced when one (minority) class is heavily under-represented with respect to the other (majority) class. In the presence of a significant overlapping, the task of learning from imbalanced data can be a very difficult problem. Additionally, if the overall imbalance ratio is different from local imbalance ratios in overlap regions, the task can become in a major challenge. This paper explains the behaviour of the k-nearest neighbour (k-NN) rule when learning from such a complex scenario. This local model is compared to other machine learning algorithms, attending to how their behaviour depends on a number of data complexity features (global imbalance, size of overlap region, and its local imbalance). As a result, several conclusions useful for classifier design are inferred.