Bayesian adaptive nearest neighbor

  • Authors:
  • Ruixin Guo;Sounak Chakraborty

  • Affiliations:
  • Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211-6100, USA;Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211-6100, USA

  • Venue:
  • Statistical Analysis and Data Mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The k nearest neighbor classification (k-NN) is a very simple and popular method for classification. However, it suffers from a major drawback, it assumes constant local class posterior probability. It is also highly dependent on and sensitive to the choice of the number of neighbors k. In addition, it severely lacks the desired probabilistic formulation. In this article, we propose a Bayesian adaptive nearest neighbor method (BANN) that can adaptively select the shape of the neighborhood and the number of neighbors k. The shape of the neighborhood is automatically selected according to the concentration of the data around each query point with the help of discriminants. The neighborhood size is not predetermined and is kept free using a prior distribution. Thus, we are able to make the model to select the appropriate neighborhood size. The model is fitted using Markov Chain Monte Carlo (MCMC), so we are not using exactly one neighborhood size but a mixture of k. Our BANN model is highly flexible, determining any local pattern in the data-generating process, and adapting it to give an improved prediction. We have applied our model on four simulated data sets with special structures and five real-life benchmark data sets. Our proposed BANN method demonstrates substantial improvement over k-NN and discriminant adaptive nearest neighbor (DANN) in all nine case studies. It also outperforms the probabilistic nearest neighbor (PNN) in most of the data analyses. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 92-105, 2010