Spatial model fitting for large datasets with applications to climate and microarray problems

Authors:
Reinhard Furrer;Stephan R. Sain
Affiliations:
Mathematical and Computer Sciences, Colorado School of Mines, Golden, USA;Geophysical Statistics Project, National Center for Atmospheric Research, Boulder, USA
Venue:
Statistics and Computing
Year:
2009

Citing 4
Cited 0

Topics in matrix analysis

Topics in matrix analysis
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite
Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability)

Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability)
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many problems in the environmental and biological sciences involve the analysis of large quantities of data. Further, the data in these problems are often subject to various types of structure and, in particular, spatial dependence. Traditional model fitting often fails due to the size of the datasets since it is difficult to not only specify but also to compute with the full covariance matrix describing the spatial dependence. We propose a very general type of mixed model that has a random spatial component. Recognizing that spatial covariance matrices often exhibit a large number of zero or near-zero entries, covariance tapering is used to force near-zero entries to zero. Then, taking advantage of the sparse nature of such tapered covariance matrices, backfitting is used to estimate the fixed and random model parameters. The novelty of the paper is the combination of the two techniques, tapering and backfitting, to model and analyze spatial datasets several orders of magnitude larger than those datasets typically analyzed with conventional approaches. Results will be demonstrated with two datasets. The first consists of regional climate model output that is based on an experiment with two regional and two driver models arranged in a two-by-two layout. The second is microarray data used to build a profile of differentially expressed genes relating to cerebral vascular malformations, an important cause of hemorrhagic stroke and seizures.