A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification

  • Authors:
  • Bin Liu;Ying Yang;Geoffrey I. Webb;Janice Boughton

  • Affiliations:
  • Clayton School of Information Technology, Monash University, Australia;Clayton School of Information Technology, Monash University, Australia;Clayton School of Information Technology, Monash University, Australia;Clayton School of Information Technology, Monash University, Australia

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Kernel density estimation (KDE) is an important method in nonparametric learning. While KDE has been studied extensively in the context of accuracy of distribution estimation, it has not been studied extensively in the context of classification. This paper studies nine bandwidth selection schemes for kernel density estimation in Naive Bayesian classification context, using 52 machine learning benchmark datasets. The contributions of this paper are threefold. First, it shows that some commonly used and very sophisticated bandwidth selection schemes do not give good performance in Naive Bayes. Surprisingly, some very simple bandwidth selection schemes give statistically significantly better performance. Second, it shows that kernel density estimation can achieve statistically significantly better classification performance than a commonly used discretization method in Naive Bayes, but only when appropriate bandwidth selection schemes are applied. Third, this study gives bandwidth distribution patterns for the investigated bandwidth selection schemes.