Nonparametric density estimation by exact leave-p-out cross-validation

Authors:
Alain Celisse;Stéphane Robin
Affiliations:
UMR 518 AgroParisTech/INRA MIA, AgroParisTech, 16 rue Claude Bernard, F-75231 Paris Cedex 05, France;UMR 518 AgroParisTech/INRA MIA, AgroParisTech, 16 rue Claude Bernard, F-75231 Paris Cedex 05, France
Venue:
Computational Statistics & Data Analysis
Year:
2008

Citing 2
Cited 3

Fast cross-validation of high-breakdown resampling methods for PCA

Computational Statistics & Data Analysis
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Combining regular and irregular histograms by penalized likelihood

Computational Statistics & Data Analysis
Estimation of the proportion of true null hypotheses in high-dimensional data under dependence

Computational Statistics & Data Analysis
Segmentation of the mean of heteroscedastic data via cross-validation

Statistics and Computing

Quantified Score

Hi-index	0.03

Visualization

Abstract

The problem of density estimation is addressed by minimization of the L^2-risk for both histogram and kernel estimators. This quadratic risk is estimated by leave-p-out cross-validation (LPO), which is made possible thanks to closed formulas, contrary to common belief. The potential gain in the use of LPO with respect to V-fold cross-validation (V-fold) in terms of the bias-variance trade-off is highlighted. An exact quantification of this extra variability, induced by the preliminary random partition of the data in the V-fold, is proposed. Furthermore, exact expressions are derived for both the bias and the variance of the risk estimator with histograms. Plug-in estimates of these quantities are provided, while their accuracy is assessed thanks to concentration inequalities. An adaptive selection procedure for p in the case of histograms is subsequently presented. This relies on minimization of the mean square error of the LPO risk estimator. Finally a simulation study is carried out which first illustrates the higher reliability of the LPO with respect to the V-fold, and then assesses the behavior of the selection procedure. For instance optimality of leave-one-out (LOO) is shown, at least empirically, in the context of regular histograms.