Rates of convergence of nearest neighbor estimation under arbitrary sampling

Authors:
S. R. Kulkarni;S. E. Posner
Affiliations:
Dept. of Electr. Eng., Princeton Univ., NJ;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 9

Rates of convergence for partitioning and nearest neighbor regression estimates with unbounded data

Journal of Multivariate Analysis
Pattern Recognition for Conditionally Independent Data

The Journal of Machine Learning Research
On Nonparametric Residual Variance Estimation

Neural Processing Letters
Residual variance estimation in machine learning

Neurocomputing
Residual variance estimation using a nearest neighbor statistic

Journal of Multivariate Analysis
On the Rate of Convergence of the Bagged Nearest Neighbor Estimate

The Journal of Machine Learning Research
Rates of convergence of the functional k-nearest neighbor estimate

IEEE Transactions on Information Theory
Functional classification with margin conditions

COLT'06 Proceedings of the 19th annual conference on Learning Theory
A tree-based regressor that adapts to intrinsic dimension

Journal of Computer and System Sciences

Quantified Score

Hi-index	754.90

Visualization

Abstract

Rates of convergence for nearest neighbor estimation are established in a general framework in terms of metric covering numbers of the underlying space. The first result is to find explicit finite sample upper bounds for the classical independent and identically distributed (i.i.d.) random sampling problem in a separable metric space setting. The convergence rate is a function of the covering numbers of the support of the distribution. For example, for bounded subsets of R r, the convergence rate is O(1/n2r/). The main result is to extend the problem to allow samples drawn from a completely arbitrary random process in a separable metric space and to examine the performance in terms of the individual sample sequences. The authors show that for every sequence of samples the asymptotic time-average of nearest neighbor risks equals twice the time-average of the conditional Bayes risks of the sequence. Finite sample upper bounds under arbitrary sampling are again obtained in terms of the covering numbers of the underlying space. In particular, for bounded subsets of Rr the convergence rate of the time-averaged risk is O(1/n2r/). The authors then establish a consistency result for kn-nearest neighbor estimation under arbitrary sampling and prove a convergence rate matching established rates for i.i.d. sampling. Finally, they show how their arbitrary sampling results lead to some classical i.i.d. sampling results and in fact extend them to stationary sampling. The framework and results are quite general while the proof techniques are surprisingly elementary