Learning naive bayes for probability estimation by feature selection

Authors:
Liangxiao Jiang;Harry Zhang
Affiliations:
Faculty of Computer Science, China University of Geosciences, Hubei, Wuhan, P.R. China;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Venue:
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Year:
2006

Citing 13
Cited 4

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Technical Note: Naive Bayes for Regression

Machine Learning
Structural extension to logistic regression: discriminative parameter learning of belief net classifiers

Eighteenth national conference on Artificial intelligence
Tree Induction for Probability-Based Ranking

Machine Learning
Learning Bayesian network classifiers by maximizing conditional likelihood

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

Machine Learning
Naive Bayes models for probability estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
Discriminative model selection for belief net structures

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
An analysis of Bayesian classifiers

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Induction of selective Bayesian classifiers

UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence

Survey of Improving Naive Bayes for Classification

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Improving Tree augmented Naive Bayes for class probability estimation

Knowledge-Based Systems
A Modified Short and Fukunaga Metric based on the attribute independence assumption

Pattern Recognition Letters
Not so greedy: Randomly Selected Naive Bayes

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Naive Bayes is a well-known effective and efficient classification algorithm. But its probability estimation is poor. In many applications, however, accurate probability estimation is often required in order to make optimal decisions. Usually, probability estimation is measured by conditional log likelihood (CLL). There have been some learning algorithms proposed recently to extend naive Bayes for high CLL, such as ERL [8, 9] and BNC-2P [10]. Unfortunately, their computational complexity is relatively high. Is there a simple but effective and efficient approach to improve the probability estimation of naive Bayes? In this paper, we propose to use feature selection for this purpose. More precisely, a search process is conducted to select a subset of attributes, and then a naive Bayes is deployed on the selected attribute set. In fact, feature selection has been successfully applied to naive Bayes and achieves significant improvement in classification accuracy. Among the feature selection algorithms for naive Bayes, selective Bayesian classifiers (SBC) by Langley et al.[13] demonstrates good performance. In this paper, we first study the performance of SBC in terms of probability estimation, and then propose an improved SBC algorithm SBC-CLL, in which the CLL score is directly used for attribute selection, instead of using classification accuracy. Our experiments show that both SBC and SBC-CLL achieve significant improvement over naive Bayes, and that SBC-CLL outperforms SBC substantially, in probability estimation measured by CLL. Our work provides an efficient and surprisingly effective approach to improve the probability estimation of naive Bayes.