Spatial model fitting for large datasets with applications to climate and microarray problems

  • Authors:
  • Reinhard Furrer;Stephan R. Sain

  • Affiliations:
  • Mathematical and Computer Sciences, Colorado School of Mines, Golden, USA;Geophysical Statistics Project, National Center for Atmospheric Research, Boulder, USA

  • Venue:
  • Statistics and Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many problems in the environmental and biological sciences involve the analysis of large quantities of data. Further, the data in these problems are often subject to various types of structure and, in particular, spatial dependence. Traditional model fitting often fails due to the size of the datasets since it is difficult to not only specify but also to compute with the full covariance matrix describing the spatial dependence. We propose a very general type of mixed model that has a random spatial component. Recognizing that spatial covariance matrices often exhibit a large number of zero or near-zero entries, covariance tapering is used to force near-zero entries to zero. Then, taking advantage of the sparse nature of such tapered covariance matrices, backfitting is used to estimate the fixed and random model parameters. The novelty of the paper is the combination of the two techniques, tapering and backfitting, to model and analyze spatial datasets several orders of magnitude larger than those datasets typically analyzed with conventional approaches. Results will be demonstrated with two datasets. The first consists of regional climate model output that is based on an experiment with two regional and two driver models arranged in a two-by-two layout. The second is microarray data used to build a profile of differentially expressed genes relating to cerebral vascular malformations, an important cause of hemorrhagic stroke and seizures.