Joint discriminative-generative modelling based on statistical tests for classification

  • Authors:
  • Jing-Hao Xue;D. Michael Titterington

  • Affiliations:
  • Department of Statistics, University of Glasgow, Glasgow G12 8QQ, UK and Department of Statistical Science, University College London, London WC1E 6BT, UK;Department of Statistics, University of Glasgow, Glasgow G12 8QQ, UK

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.10

Visualization

Abstract

In statistical pattern classification, generative approaches, such as linear discriminant analysis (LDA), assume a data-generating process (DGP), whereas discriminative approaches, such as linear logistic regression (LLR), do not model the DGP. In general, a generative classifier performs better than its discriminative counterpart if the DGP is well-specified and worse than the latter if the DGP is clearly mis-specified. In view of this, this paper presents a joint discriminative-generative modelling (JoDiG) approach, by partitioning predictor variables X into two sub-vectors, namely X"G, to which a generative approach is applied, and X"D, to be treated by a discriminative approach. This partitioning of X is based on statistical tests of the assumed DGP: the variables that clearly fail the tests are grouped as X"D and the rest as X"G. Then the generative and discriminative approaches are combined in a probabilistic rather than a heuristic way. The principle of the JoDiG approach is quite generic, but for illustrative purposes numerical studies of the paper focus on a widely-used case, in which the DGP assumes a multivariate normal distribution for each class. In this case, the JoDiG approach uses LDA for X"G and LLR for X"D. Numerical experiments on real and simulated data demonstrate that the performance of this new approach to classification is similar to or better than that of its discriminative and generative counterparts, in particular when the size of the training-set is comparable to the dimension of the data.