Algorithms for maximum-likelihood bandwidth selection in kernel density estimators

Authors:
José M. Leiva-Murillo;Antonio ArtéS-RodríGuez
Affiliations:
Dept. Signal Theory and Communication, Universidad Carlos III de Madrid, Av. Universidad, 30, Leganés 28911 (Madrid), Spain;Dept. Signal Theory and Communication, Universidad Carlos III de Madrid, Av. Universidad, 30, Leganés 28911 (Madrid), Spain
Venue:
Pattern Recognition Letters
Year:
2012

Citing 8
Cited 1

Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Feature extraction by non parametric mutual information maximization

The Journal of Machine Learning Research
On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions

IEEE Transactions on Computers
Bayesian classifiers based on kernel density estimation: Flexible classifiers

International Journal of Approximate Reasoning
A Bayesian approach to bandwidth selection for multivariate kernel density estimation

Computational Statistics & Data Analysis
A nonparametric estimation of the entropy for absolutely continuous distributions (Corresp.)

IEEE Transactions on Information Theory
Discriminative components of data

IEEE Transactions on Neural Networks

Bayesian predictive kernel discriminant analysis

Pattern Recognition Letters

Quantified Score

Hi-index	0.10

Visualization

Abstract

In machine learning and statistics, kernel density estimators are rarely used on multivariate data due to the difficulty of finding an appropriate kernel bandwidth to overcome overfitting. However, the recent advances on information-theoretic learning have revived the interest on these models. With this motivation, in this paper we revisit the classical statistical problem of data-driven bandwidth selection by cross-validation maximum likelihood for Gaussian kernels. We find a solution to the optimization problem under both the spherical and the general case where a full covariance matrix is considered for the kernel. The fixed-point algorithms proposed in this paper obtain the maximum likelihood bandwidth in few iterations, without performing an exhaustive bandwidth search, which is unfeasible in the multivariate case. The convergence of the methods proposed is proved. A set of classification experiments are performed to prove the usefulness of the obtained models in pattern recognition.