Network-based sparse Bayesian classification

  • Authors:
  • Jose Miguel Hernández-Lobato;Daniel Hernández-Lobato;Alberto Suárez

  • Affiliations:
  • Escuela Politécnica Superior, Universidad Autónoma de Madrid, Francisco Tomás y Valiente 11, Madrid 28049, Spain;Machine Learning Group, ICTEAM Institute, Université Catholique de Louvain Place Sainte Barbe 2, B-1348 Louvain-la-Neuve, Belgium;Escuela Politécnica Superior, Universidad Autónoma de Madrid, Francisco Tomás y Valiente 11, Madrid 28049, Spain

  • Venue:
  • Pattern Recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

In some classification problems there is prior information about the joint relevance of groups of features. This knowledge can be encoded in a network whose nodes correspond to features and whose edges connect features that should be either both excluded or both included in the predictive model. In this paper, we introduce a novel network-based sparse Bayesian classifier (NBSBC) that makes use of the information about feature dependencies encoded in such a network to improve its prediction accuracy, especially in problems with a high-dimensional feature space and a limited amount of available training data. Approximate Bayesian inference is efficiently implemented in this model using expectation propagation. The NBSBC method is validated on four real-world classification problems from different domains of application: phonemes, handwritten digits, precipitation records and gene expression measurements. A comparison with state-of-the-art methods (support vector machine, network-based support vector machine and graph lasso) show that NBSBC has excellent predictive performance. It has the best accuracy in three of the four problems analyzed and ranks second in the modeling of the precipitation data. NBSBC also yields accurate and robust rankings of the individual features according to their relevance to the solution of the classification problem considered. The accuracy and stability of these estimates is an important factor in the good overall performance of this method.