Workflow management for high volume supernova search

  • Authors:
  • Cecilia R. Aragon;Karl J. Runge

  • Affiliations:
  • Lawrence Berkeley National Lab, Berkeley, CA;Space Sciences Laboratory, Berkeley, CA

  • Venue:
  • Proceedings of the 2009 ACM symposium on Applied Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Observational astrophysics has recently become a data-intensive science after many decades of relative data poverty. As a result, many of the algorithms developed for processing astronomical data, although well established for low-volume data capture, do not scale well to today's high-volume sky surveys and transient searches. Specifically, problems may occur with data transfer, workflow management, efficient parallelization, and integration of legacy code. Observational astrophysics workflows present computational challenges unique in high performance computing, including 24/7 operations, time-critical processing, and very large numbers of relatively small data files which must all be processed and archived. We present a case study based on Sunfall, a distributed, parallel scientific workflow system we built for the Nearby Supernova Factory, the largest data-volume supernova search currently in existence. We describe innovative techniques for data transfer and workflow management, and discuss lessons learned in building a large-scale observational astrophysics workflow management system.