Experiences using smaash to manage data-intensive simulations

  • Authors:
  • Randy Hudson;John Norris;Lynn B. Reid;Klaus Weide;G. Cal Jordan IV;Michael E. Papka

  • Affiliations:
  • Flash Center for Computational Science, Chicago, IL, USA;Flash Center for Computational Science, Chicago, IL, USA;WA Geothermal Centre of Excellence, Kensington, Australia;Flash Center for Computational Science, Chicago, IL, USA;Flash Center for Computational Science, Chicago, IL, USA;Computation Institute, Argonne National Laboratory, Chicago, IL, USA

  • Venue:
  • Proceedings of the 20th international symposium on High performance distributed computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

High performance scientific computer simulations created with such systems as the University of Chicago's FLASH code generate enormous amounts of data that must be captured, cataloged, and analyzed. Unless this is formally done, monitoring such simulations, tracking and reproducing old ones, and analyzing and archiving their output, can be haphazard and idiosyncratic. Smaash, a simulation management and analysis system that has been developed at the University of Chicago and Argonne National Laboratory, seeks to solve some of these problems by offering what approaches a single point of control and analysis, a metadata-base, and a set of tools that automate some of what scientists have been doing by hand. Smaash was designed to be independent of the particular simulation code, and is accessible from many computing platforms. It is automatic and standardized, and was built using open source software tools. Data security is considered throughout the process, yet users are insulated from onerous verification procedures. Because the system was developed with feedback from scientific users, its user interface reflects how scientists work in their daily life. We describe our system and a typical simulation it was designed to support. We illustrate its utility with several examples describing our experience of freeing scientists from the data manipulation phase to focus on the computational results and the analysis of high performance computing.