Exact bootstrap k-nearest neighbor learners

  • Authors:
  • Brian M. Steele

  • Affiliations:
  • Dept. of Mathematical Sciences, University of Montana, Missoula, USA 59812

  • Venue:
  • Machine Learning
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bootstrap aggregation, or bagging, is a method of reducing the prediction error of a statistical learner. The goal of bagging is to construct a new learner which is the expectation of the original learner with respect to the empirical distribution function. In nearly all cases, the expectation cannot be computed analytically, and bootstrap sampling is used to produce an approximation. The k-nearest neighbor learners are exceptions to this generalization, and exact bagging of many k-nearest neighbor learners is straightforward. This article presents computationally simple and fast formulae for exact bagging of k-nearest neighbor learners and extends exact bagging methods from the conventional bootstrap sampling (sampling n observations with replacement from a set of n observations) to bootstrap sub-sampling schemes (with and without replacement). In addition, a partially exact k-nearest neighbor regression learner is developed. The article also compares the prediction error associated with elementary and exact bagging k-nearest neighbor learners, and several other ensemble methods using a suite of publicly available data sets.