On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification

  • Authors:
  • Gérard Biau;Luc Devroye

  • Affiliations:
  • LSTA & LPMA, Universitéé Pierre et Marie Curie-Paris VI, Boíte 158, Tour 15-25, 2ème étage, 4 place Jussieu, 75252 Paris Cedex 05, France and DMA, Ecole Normale Supér ...;School of Computer Science, McGill University, Montreal, Canada H3A 2K6

  • Venue:
  • Journal of Multivariate Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Let X"1,...,X"n be identically distributed random vectors in R^d, independently drawn according to some probability density. An observation X"i is said to be a layered nearest neighbour (LNN) of a point x if the hyperrectangle defined by x and X"i contains no other data points. We first establish consistency results on L"n(x), the number of LNN of x. Then, given a sample (X,Y),(X"1,Y"1),...,(X"n,Y"n) of independent identically distributed random vectors from R^dxR, one may estimate the regression function r(x)=E[Y|X=x] by the LNN estimate r"n(x), defined as an average over the Y"i's corresponding to those X"i which are LNN of x. Under mild conditions on r, we establish the consistency of E|r"n(x)-r(x)|^p towards 0 as n-~, for almost all x and all p=1, and discuss the links between r"n and the random forest estimates of Breiman (2001) [8]. We finally show the universal consistency of the bagged (bootstrap-aggregated) nearest neighbour method for regression and classification.