Scaling Properties of Common Statistical Operators for Gridded Datasets

  • Authors:
  • Charles S. Zender;Harry Mangalam

  • Affiliations:
  • DEPARTMENT OF EARTH SYSTEM SCIENCE, UNIVERSITY OF CALIFORNIA, IRVINE;DEPARTMENT OF EARTH SYSTEM SCIENCE, UNIVERSITY OF CALIFORNIA, IRVINE

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

An accurate cost model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satellite- or climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely used geoscience data processing software, the netCDF Operators (NCO). I/O performance and library design dominate throughput for simple analysis (e.g. dataset differencing). Dataset structure can reduce analysis throughput ten-fold relative to same-sized unstructured datasets. We demonstrate algorithmic optimizations which substantially increase throughput for more complex, arithmetic-dominated analysis such as weighted-averaging of multi-dimensional data. These scaling properties can help to estimate costs of distribution strategies for data reduction in cluster and grid environments.