Compression-Based Averaging of Selective Naive Bayes Classifiers

Authors:
Marc Boullé
Affiliations:
-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 0
Cited 14

2008 Special Issue: Analysis of the IJCNN 2007 agnostic learning vs. prior knowledge challenge

Neural Networks
A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning

The Journal of Machine Learning Research
Credal Model Averaging: An Extension of Bayesian Model Averaging to Imprecise Probabilities

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
VC dimension and inner product space induced by Bayesian networks

International Journal of Approximate Reasoning
A Parameter-Free Classification Method for Large Scale Learning

The Journal of Machine Learning Research
Model Selection: Beyond the Bayesian/Frequentist Divide

The Journal of Machine Learning Research
Design and analysis of the KDD cup 2009: fast scoring on a large orange customer database

ACM SIGKDD Explorations Newsletter
The orange customer analysis platform

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Towards text-based recommendations

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Multiclass classification with potential function rules: Margin distribution and generalization

Pattern Recognition
On the properties of concept classes induced by multivalued Bayesian networks

Information Sciences: an International Journal
A bayesian approach for classification rule mining in quantitative databases

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Using reinforcement learning to find an optimal set of features

Computers & Mathematics with Applications
Credal ensembles of classifiers

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and finally results in a naive Bayes classifier with "soft variable selection". Extensive experiments show that the compression-based averaged classifier outperforms the Bayesian model averaging scheme.