Naive bayes classifiers that perform well with continuous variables

  • Authors:
  • Remco R. Bouckaert

  • Affiliations:
  • Computer Science Department, University of Waikato & Xtal Mountain Information Technology, New Zealand

  • Venue:
  • AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are three main methods for handling continuous variables in naive Bayes classifiers, namely, the normal method (parametric approach), the kernel method (non parametric approach) and discretization In this article, we perform a methodologically sound comparison of the three methods, which shows large mutual differences of each of the methods and no single method being universally better This suggests that a method for selecting one of the three approaches to continuous variables could improve overall performance of the naive Bayes classifier We present three methods that can be implemented efficiently v-fold cross validation for the normal, kernel and discretization method Empirical evidence suggests that selection using 10 fold cross validation (especially when repeated 10 times) can largely and significantly improve over all performance of naive Bayes classifiers and consistently outperform any of the three popular methods for dealing with continuous variables on their own This is remarkable, since selection among more classifiers does not consistently result in better accuracy.