Proportional k-Interval Discretization for Naive-Bayes Classifiers

Authors:
Ying Yang;Geoffrey I. Webb
Affiliations:
-;-
Venue:
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Year:
2001

Citing 8
Cited 12

Statistics: principles and methods

Statistics: principles and methods
Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
On changing continuous attributes into ordered discrete attributes

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
Why Discretization Works for Naive Bayesian Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

OFFD: Optimal Flexible Frequency Discretization for Naïve Bayes Classification

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Combining Feature Selection and Local Modelling in the KDD Cup 99 Dataset

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Local modeling classifier for microarray gene-expression data

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset

Expert Systems with Applications: An International Journal
A nearest features classifier using a self-organizing map for memory base evaluation

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Unsupervised discretization using tree-based density estimation

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Incremental discretization for Naïve-Bayes classifier

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Predicting stock market trends using hybrid ant-colony-based data mining algorithms: an empirical validation on the Bombay Stock Exchange

International Journal of Business Intelligence and Data Mining
Non-Disjoint discretization for aggregating one-dependence estimator classifiers

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
2011 Special Issue: A study of performance on microarray data sets for a classifier based on information theoretic learning

Neural Networks
A network intrusion detection system based on a Hidden Naïve Bayes multiclass classifier

Expert Systems with Applications: An International Journal
A decision-making model for environmental behavior in agent-based modeling

IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.