Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors

  • Authors:
  • Stefanos Ougiaroglou;Alexandros Nanopoulos;Apostolos N. Papadopoulos;Yannis Manolopoulos;Tatjana Welzer-Druzovec

  • Affiliations:
  • Department of Informatics, Aristotle University, Thessaloniki, Greece;Department of Informatics, Aristotle University, Thessaloniki, Greece;Department of Informatics, Aristotle University, Thessaloniki, Greece;Department of Informatics, Aristotle University, Thessaloniki, Greece;Faculty of Electrical Eng. and Computer Science, University of Maribor, Slovenia

  • Venue:
  • ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification based on k-nearest neighbors (kNN classification) is one of the most widely used classification methods. The number k of nearest neighbors used for achieving a high accuracy in classification is given in advance and is highly dependent on the data set used. If the size of data set is large, the sequential or binary search of NNs is inapplicable due to the increased computational costs. Therefore, indexing schemes are frequently used to speed-up the classification process. If the required number of nearest neighbors is high, the use of an index may not be adequate to achieve high performance. In this paper, we demonstrate that the execution of the nearest neighbor search algorithm can be interrupted if some criteria are satisfied. This way, a decision can be made without the computation of all k nearest neighbors of a new object. Three different heuristics are studied towards enhancing the nearest neighbor algorithm with an early-break capability. These heuristics aim at: (i) reducing computation and I/O costs as much as possible, and (ii) maintaining classification accuracy at a high level. Experimental results based on real-life data sets illustrate the applicability of the proposed method in achieving better performance than existing methods.