Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Authors:
Chun-Nan Hsu;Hung-Ju Huang;Tzu-Tsung Wong
Affiliations:
Institute of Information Science, Academia Sinica, Nankang, Taipei City 115, Taiwan. chunnan@iis.sinica.edu.tw;Department of Computer and Information Science, National Chiao-Tung University, Hsinchu City 300, Taiwan. hungju@cis.nctu.edu.tw;Institute of Information Management, National Cheng-Kung University, Tainan City 701, Taiwan. tzutsung@mail.ncku.edu.tw
Venue:
Machine Learning
Year:
2003

Citing 8
Cited 11

On estimating probabilities in tree pruning

EWSL-91 Proceedings of the European working session on learning on Machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A tutorial on learning with Bayesian networks

Learning in graphical models
Introduction to S and S-Plus

Introduction to S and S-Plus
Graphical Belief Modeling

Graphical Belief Modeling
A Bayesian network classifier that combines a finite mixture model and a naïve bayes model

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Unsupervised learning of a finite discrete mixture: Applications to texture modeling and image databases summarization

Journal of Visual Communication and Image Representation
An improved Naive Bayesian classifier with advanced discretisation method

International Journal of Intelligent Systems Technologies and Applications
Wrapper discretization by means of estimation of distribution algorithms

Intelligent Data Analysis
Improved Algorithms for Univariate Discretization of Continuous Features

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Discretization for naive-Bayes learning: managing discretization bias and variance

Machine Learning
Alternative prior assumptions for improving the performance of naïve Bayesian classifiers

Data Mining and Knowledge Discovery
Individual attribute prior setting methods for naïve Bayesian classifiers

Pattern Recognition
Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

Applied Intelligence
Orthogonally rotational transformation for naive bayes learning

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
A hybrid discretization method for naïve Bayesian classifiers

Pattern Recognition
Learning mixtures of polynomials of multidimensional probability densities from data using B-spline interpolation

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is “perfect aggregation,” which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addition, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and bin-log l, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.