The EM algorithm in a distributed computing environment for modelling environmental space-time data

  • Authors:
  • Alessandro Fassò;Michela Cameletti

  • Affiliations:
  • University of Bergamo, Department of Information Technology and Mathematical Methods, Viale Marconi n. 5, 24044 Dalmine (BG), Bergamo, Italy;University of Bergamo, Department of Information Technology and Mathematical Methods, Viale Marconi n. 5, 24044 Dalmine (BG), Bergamo, Italy

  • Venue:
  • Environmental Modelling & Software
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical models for spatio-temporal data are increasingly used in environmetrics, climate change, epidemiology, remote sensing and dynamical risk mapping. Due to the complexity of the relationships among the involved variables and dimensionality of the parameter set to be estimated, techniques for model definition and estimation which can be worked out stepwise are welcome. In this context, hierarchical models are a suitable solution since they make it possible to define the joint dynamics and the full likelihood starting from simpler conditional submodels. Moreover, for a large class of hierarchical models, the maximum likelihood estimation procedure can be simplified using the Expectation-Maximization (EM) algorithm. In this paper, we define the EM algorithm for a rather general three-stage spatio-temporal hierarchical model, which includes also spatio-temporal covariates. In particular, we show that most of the parameters are updated using closed forms and this guarantees stability of the algorithm unlike the classical optimization techniques of the Newton-Raphson type for maximizing the full likelihood function. Moreover, we illustrate how the EM algorithm can be combined with a spatio-temporal parametric bootstrap for evaluating the parameter accuracy through standard errors and non-Gaussian confidence intervals. To do this a new software library in form of a standard R package has been developed. Moreover, realistic simulations on a distributed computing environment allow us to discuss the algorithm properties and performance also in terms of convergence iterations and computing times.