An analysis of how training data complexity affects the nearest neighbor classifiers

  • Authors:
  • J. S. Sánchez;R. A. Mollineda;J. M. Sotoca

  • Affiliations:
  • Universitat Jaume I, Departamento de Llenguatges i Sistemes Informátics, Av. Vicent Sos Baynat s/n, 12071, Castelló, Spain;Universitat Jaume I, Departamento de Llenguatges i Sistemes Informátics, Av. Vicent Sos Baynat s/n, 12071, Castelló, Spain;Universitat Jaume I, Departamento de Llenguatges i Sistemes Informátics, Av. Vicent Sos Baynat s/n, 12071, Castelló, Spain

  • Venue:
  • Pattern Analysis & Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The k-nearest neighbors (k-NN) classifier is one of the most popular supervised classification methods. It is very simple, intuitive and accurate in a great variety of real-world domains. Nonetheless, despite its simplicity and effectiveness, practical use of this rule has been historically limited due to its high storage requirements and the computational costs involved. On the other hand, the performance of this classifier appears strongly sensitive to training data complexity. In this context, by means of several problem difficulty measures, we try to characterize the behavior of the k-NN rule when working under certain situations. More specifically, the present analysis focuses on the use of some data complexity measures to describe class overlapping, feature space dimensionality and class density, and discover their relation with the practical accuracy of this classifier.