Efficient data IO for a Parallel Global Cloud Resolving Model

Authors:
Bruce Palmer;Annette Koontz;Karen Schuchardt;Ross Heikes;David Randall
Affiliations:
Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Department of Atmospheric Sciences, Colorado State University, Fort Collins, CO 80523, USA;Department of Atmospheric Sciences, Colorado State University, Fort Collins, CO 80523, USA
Venue:
Environmental Modelling & Software
Year:
2011

Citing 13
Cited 3

Parallel computation in atmospheric chemical modeling

Parallel Computing
The impact of spatial layout of jobs on parallel I/O performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Climate Modeling with Spherical Geodesic Grids

Computing in Science and Engineering
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Parallel netCDF: A High-Performance Scientific I/O Interface

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Overview of the Software Design of the Community Climate System Model

International Journal of High Performance Computing Applications
Investigation of leading HPC I/O performance using a scientific-application derived benchmark

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)

CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
Adaptable, metadata rich IO methods for portable high performance IO

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A comparison of three parallelisation methods for 2D flood inundation models

Environmental Modelling & Software
Efficient parallelization of a dynamic global vegetation model with river routing

Environmental Modelling & Software
Parallelization of a two-dimensional flood inundation model based on domain decomposition

Environmental Modelling & Software
An application-level parallel I/O library for Earth system models

International Journal of High Performance Computing Applications

Efficient data restructuring and aggregation for I/O acceleration in PIDX

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High-performance computing tools for the integrated assessment and modelling of social-ecological systems

Environmental Modelling & Software
Interactive ray casting of geodesic grids

EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Execution of a Global Cloud Resolving Model (GCRM) at target resolutions of 2-4 km will generate, at a minimum, 10s of Gigabytes of data per variable per snapshot. Writing this data to disk, without creating a serious bottleneck in the execution of the GCRM code, while also supporting efficient post-execution data analysis is a significant challenge. This paper discusses an Input/Output (IO) application programmer interface (API) for the GCRM that efficiently moves data from the model to disk while maintaining support for community standard formats, avoiding the creation of very large numbers of files, and supporting efficient analysis. Several aspects of the API will be discussed in detail. First, we discuss the output data layout which linearizes the data in a consistent way that is independent of the number of processors used to run the simulation and provides a convenient format for subsequent analyses of the data. Second, we discuss the flexible API interface that enables modelers to easily add variables to the output stream by specifying where in the GCRM code these variables are located and to flexibly configure the choice of outputs and distribution of data across files. The flexibility of the API is designed to allow model developers to add new data fields to the output as the model develops and new physics is added. It also provides a mechanism for allowing users of the GCRM code to adjust the output frequency and the number of fields written depending on the needs of individual calculations. Third, we describe the mapping to the NetCDF data model with an emphasis on the grid description. Fourth, we describe our messaging algorithms and IO aggregation strategies that are used to achieve high bandwidth while simultaneously writing concurrently from many processors to shared files. We conclude with initial performance results.