Parallel high-resolution climate data analysis using swift

  • Authors:
  • Matthew Woitaszek;John M. Dennis;Taleena R. Sines

  • Affiliations:
  • National Center for Atmospheric Research, Boulder, CO, USA;National Center for Atmospheric Research, Boulder, CO, USA;Frostburg State University, Frostburg, MD, USA

  • Venue:
  • Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Advances in software parallelism and high-performance systems have resulted in an order of magnitude increase in the volume of output data produced by the Community Earth System Model (CESM). As the volume of data produced by CESM increases, the single-threaded script-based software packages traditionally used to post-process model output data have become a bottleneck in the analysis process. This paper presents a parallel version of the CESM atmosphere model data analysis workflow implemented using the Swift scripting language. Using the Swift implementation of the workflow, the time to analyze a 10-year atmosphere simulation on a typical cluster is reduced from 95 to 32 minutes on a single 8-core node and to 20 minutes on two nodes. The parallelized workflow is then used to evaluate several new data-intensive computational systems that feature RAM-based and flash-based storage. Even when constraining parallelism to limit the amount of file system space used by intermediate temporary data, our results show that the Swift-based implementation significantly reduces data analysis time.