On the k-NN performance in a challenging scenario of imbalance and overlapping

  • Authors:
  • V. García;R. A. Mollineda;J. S. Sánchez

  • Affiliations:
  • Instituto Tecnológico de Toluca. Av. Tecnológico s/n, Laboratorio de Reconocimiento de Patrones, 52140, Metepec, México;Universitat Jaume I. Av. Vicent Sos Baynat s/n, Departament de Llenguatges i Sistemes Informàtics, 12071, Castelló, Spain;Universitat Jaume I. Av. Vicent Sos Baynat s/n, Departament de Llenguatges i Sistemes Informàtics, 12071, Castelló, Spain

  • Venue:
  • Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A two-class data set is said to be imbalanced when one (minority) class is heavily under-represented with respect to the other (majority) class. In the presence of a significant overlapping, the task of learning from imbalanced data can be a very difficult problem. Additionally, if the overall imbalance ratio is different from local imbalance ratios in overlap regions, the task can become in a major challenge. This paper explains the behaviour of the k-nearest neighbour (k-NN) rule when learning from such a complex scenario. This local model is compared to other machine learning algorithms, attending to how their behaviour depends on a number of data complexity features (global imbalance, size of overlap region, and its local imbalance). As a result, several conclusions useful for classifier design are inferred.