Nonparametric density estimation by exact leave-p-out cross-validation

  • Authors:
  • Alain Celisse;Stéphane Robin

  • Affiliations:
  • UMR 518 AgroParisTech/INRA MIA, AgroParisTech, 16 rue Claude Bernard, F-75231 Paris Cedex 05, France;UMR 518 AgroParisTech/INRA MIA, AgroParisTech, 16 rue Claude Bernard, F-75231 Paris Cedex 05, France

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.03

Visualization

Abstract

The problem of density estimation is addressed by minimization of the L^2-risk for both histogram and kernel estimators. This quadratic risk is estimated by leave-p-out cross-validation (LPO), which is made possible thanks to closed formulas, contrary to common belief. The potential gain in the use of LPO with respect to V-fold cross-validation (V-fold) in terms of the bias-variance trade-off is highlighted. An exact quantification of this extra variability, induced by the preliminary random partition of the data in the V-fold, is proposed. Furthermore, exact expressions are derived for both the bias and the variance of the risk estimator with histograms. Plug-in estimates of these quantities are provided, while their accuracy is assessed thanks to concentration inequalities. An adaptive selection procedure for p in the case of histograms is subsequently presented. This relies on minimization of the mean square error of the LPO risk estimator. Finally a simulation study is carried out which first illustrates the higher reliability of the LPO with respect to the V-fold, and then assesses the behavior of the selection procedure. For instance optimality of leave-one-out (LOO) is shown, at least empirically, in the context of regular histograms.