Maximum Margin Bayesian Network Classifiers

Authors:
Franz Pernkopf;Michael Wohlmayr;Sebastian Tschiatschek
Affiliations:
Graz University of Technology, Graz;Graz University of Technology, Graz;Graz University of Technology, Graz
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2012

Citing 0
Cited 3

Bayesian network classifiers with reduced precision parameters

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Bandit-based structure learning for bayesian network classifiers

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Integrated Fisher linear discriminants: An empirical study

Pattern Recognition

Quantified Score

Hi-index	0.14

Visualization

Abstract

We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation [1]. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.