Efficient data IO for a Parallel Global Cloud Resolving Model

  • Authors:
  • Bruce Palmer;Annette Koontz;Karen Schuchardt;Ross Heikes;David Randall

  • Affiliations:
  • Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Department of Atmospheric Sciences, Colorado State University, Fort Collins, CO 80523, USA;Department of Atmospheric Sciences, Colorado State University, Fort Collins, CO 80523, USA

  • Venue:
  • Environmental Modelling & Software
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Execution of a Global Cloud Resolving Model (GCRM) at target resolutions of 2-4 km will generate, at a minimum, 10s of Gigabytes of data per variable per snapshot. Writing this data to disk, without creating a serious bottleneck in the execution of the GCRM code, while also supporting efficient post-execution data analysis is a significant challenge. This paper discusses an Input/Output (IO) application programmer interface (API) for the GCRM that efficiently moves data from the model to disk while maintaining support for community standard formats, avoiding the creation of very large numbers of files, and supporting efficient analysis. Several aspects of the API will be discussed in detail. First, we discuss the output data layout which linearizes the data in a consistent way that is independent of the number of processors used to run the simulation and provides a convenient format for subsequent analyses of the data. Second, we discuss the flexible API interface that enables modelers to easily add variables to the output stream by specifying where in the GCRM code these variables are located and to flexibly configure the choice of outputs and distribution of data across files. The flexibility of the API is designed to allow model developers to add new data fields to the output as the model develops and new physics is added. It also provides a mechanism for allowing users of the GCRM code to adjust the output frequency and the number of fields written depending on the needs of individual calculations. Third, we describe the mapping to the NetCDF data model with an emphasis on the grid description. Fourth, we describe our messaging algorithms and IO aggregation strategies that are used to achieve high bandwidth while simultaneously writing concurrently from many processors to shared files. We conclude with initial performance results.