On predictive distributions and Bayesian networks

  • Authors:
  • P. Kontkanen;P. Myllymäki;T. Silander;H. Tirri;P. Grünwald

  • Affiliations:
  • Complex Systems Computation Group (CoSCo), P.O. Box 26, Department of Computer Science, FIN-00014 University of Helsinki, Finland. pkontkan@cs.helsinki.fi http://www.cs.Helsinki.FI/research/cosco/;Complex Systems Computation Group (CoSCo), P.O. Box 26, Department of Computer Science, FIN-00014 University of Helsinki, Finland. myllymak@cs.helsinki.fi http://www.cs.Helsinki.FI/research/cosco/;Complex Systems Computation Group (CoSCo), P.O. Box 26, Department of Computer Science, FIN-00014 University of Helsinki, Finland. tsilande@cs.helsinki.fi http://www.cs.Helsinki.FI/research/cosco/;Complex Systems Computation Group (CoSCo), P.O. Box 26, Department of Computer Science, FIN-00014 University of Helsinki, Finland. tirri@cs.helsinki.fi http://www.cs.Helsinki.FI/research/cosco/;Department of Computer Science, Stanford University, Stanford, CA 94305, USA. grunwald@cs.stanford.edu

  • Venue:
  • Statistics and Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we are interested in discrete predictionproblems for adecision-theoretic setting, where the task is to compute thepredictive distribution for a finite set of possiblealternatives. This question is first addressed in a general Bayesianframework, where we consider a set of probability distributionsdefined by some parametric model class. Given a prior distribution onthe model parameters and a set of sample data, one possible approachfor determining a predictive distribution is to fix the parameters tothe instantiation with the maximum a posteriori probability. Amore accurate predictive distribution can be obtained by computing theevidence (marginal likelihood), i.e., the integral overall the individual parameter instantiations. As an alternative tothese two approaches, we demonstrate how to use Rissanen's newdefinition of stochastic complexity for determining predictivedistributions, and show how the evidence predictive distribution withJeffrey's prior approaches the new stochastic complexity predictivedistribution in the limit with increasing amount of sample data. Tocompare the alternative approaches in practice, each of the predictivedistributions discussed is instantiated in the Bayesian network modelfamily case. In particular, to determine Jeffrey's prior for this modelfamily, we show how to compute the (expected) Fisher informationmatrix for a fixed but arbitrary Bayesian network structure. In theempirical part of the paper the predictive distributions are comparedby using the simple tree-structured Naive Bayes model, which is usedin the experiments for computational reasons. The experimentationwith several public domain classification datasets suggest that theevidence approach produces the most accurate predictions in thelog-score sense. The evidence-based methods are also quite robust inthe sense that they predict surprisingly well even when only a smallfraction of the full training set is used.