A hybrid discretization method for naïve Bayesian classifiers

Authors:
Tzu-Tsung Wong
Affiliations:
Institute of Information Management, National Cheng Kung University 1, Ta-Sheuh Road, Tainan City 701, Taiwan, Republic of China
Venue:
Pattern Recognition
Year:
2012

Citing 11
Cited 2

On changing continuous attributes into ordered discrete attributes

EWSL-91 Proceedings of the European working session on learning on Machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Machine Learning

Machine Learning
Feature Selection via Discretization

IEEE Transactions on Knowledge and Data Engineering
On Estimating Probabilities in Tree Pruning

EWSL '91 Proceedings of the European Working Session on Machine Learning
Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Machine Learning
Discretization for naive-Bayes learning: managing discretization bias and variance

Machine Learning
Alternative prior assumptions for improving the performance of naïve Bayesian classifiers

Data Mining and Knowledge Discovery
Review:

The Knowledge Engineering Review
Individual attribute prior setting methods for naïve Bayesian classifiers

Pattern Recognition
Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

Applied Intelligence

Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification

Data Mining and Knowledge Discovery
Speeding up incremental wrapper feature subset selection with Naive Bayes classifier

Knowledge-Based Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Since naive Bayesian classifiers are suitable for processing discrete attributes, many methods have been proposed for discretizing continuous ones. However, none of the previous studies apply more than one discretization method to the continuous attributes in a data set for naive Bayesian classifiers. Different approaches employ different information embedded in continuous attributes to determine the boundaries for discretization. It is likely that discretizing the continuous attributes in a data set using different methods can utilize the information embedded in the attributes more thoroughly and thus improve the performance of naive Bayesian classifiers. In this study, we propose a nonparametric measure to evaluate the dependence level between a continuous attribute and the class. The nonparametric measure is then used to develop a hybrid method for discretizing continuous attributes so that the accuracy of the naive Bayesian classifier can be enhanced. This hybrid method is tested on 20 data sets, and the results demonstrate that discretizing the continuous attributes in a data set by various methods can generally have a higher prediction accuracy.