On changing continuous attributes into ordered discrete attributes
EWSL-91 Proceedings of the European working session on learning on Machine learning
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Proportional k-Interval Discretization for Naive-Bayes Classifiers
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Incremental discretization for Naïve-Bayes classifier
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Improving naive Bayes classifier using conditional probabilities
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Hi-index | 0.00 |
Incremental Flexible Frequency Discretization (IFFD) is a recently proposed discretization approach for Naïve Bayes (NB). IFFD performs satisfactory by setting the minimal interval frequency for discretized intervals as a fixed number. In this paper, we first argue that this setting cannot guarantee optimal classification performance in terms of classification error. We observed empirically that an optimal minimal interval frequency existed for each dataset. We thus proposed a sequential search and wrapper based incremental discretization method for NB: named Optimal Flexible Frequency Discretization (OFFD). Experiments were conducted on 17 datasets from UCI machine learning repository and performance was compared between NB trained on the data discretized by OFFD, IFFD, PKID, and FFD respectively. Results show that OFFD works better than these alternatives for NB. Experiments between NB discretized on the data with OFFD and C4.5 showed that our new method outperforms C4.5 on most of the datasets we have tested.