A novel two-level nearest neighbor classification algorithm using an adaptive distance metric

  • Authors:
  • Yunlong Gao;Jinyan Pan;Guoli Ji;Zijiang Yang

  • Affiliations:
  • Department of Automation, Xiamen University, Xiamen 361005, China;College of Information Engineering, Jimei University, Xiamen 361021, China;Department of Automation, Xiamen University, Xiamen 361005, China;School of Information Technology, York University, Toronto, Canada M3J 1P3

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

When there exist an infinite number of samples in the training set, the outcome from nearest neighbor classification (kNN) is independent on its adopted distance metric. However, it is impossible that the number of training samples is infinite. Therefore, selecting distance metric becomes crucial in determining the performance of kNN. We propose a novel two-level nearest neighbor algorithm (TLNN) in order to minimize the mean-absolute error of the misclassification rate of kNN with finite and infinite number of training samples. At the low-level, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. At the high-level, AdaBoost is used as guidance for local information extraction. Data invariance is maintained by TLNN and the highly stretched or elongated neighborhoods along different directions are produced. The TLNN algorithm can reduce the excessive dependence on the statistical method which learns prior knowledge from the training data. Even the linear combination of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. The experiments on both synthetic and real world data sets provide justifications for our proposed method.