A distributed workflow for an astrophysical OpenMP application: using the data capacitor over WAN to enhance productivity

Authors:
Robert Henschel;Scott Michael;Stephen Simms
Affiliations:
Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN
Venue:
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Year:
2010

Citing 2
Cited 3

Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Empowering distributed workflow with the data capacitor: maximizing lustre performance across the wide area network

Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches

A compelling case for a centralized filesystem on the TeraGrid: enhancing an astrophysical workflow with the data capacitor WAN as a test case

Proceedings of the 2010 TeraGrid Conference
A study of lustre networking over a 100 gigabit wide area network with 50 milliseconds of latency

Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
Demonstrating lustre over a 100Gbps wide area network of 3,500km

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Astrophysical simulations of protoplanetary disks and gas giant planet formation are being performed with a variety of numerical methods. Some of the codes in use today have been producing scientifically significant results for several years, or even decades. Each must simulate millions of resolution elements for millions of time steps, capture and store output data, and rapidly and efficiently analyze this data. To do this effectively, a parallel code is needed that scales to tens or hundreds of processors. Furthermore, an efficient workflow for the transport, analysis, and interpretation of the output data is needed to achieve scientifically meaningful results. Since such simulations are usually performed on moderate to large parallel systems, the compute system is generally located at a remote institution. However, analysis of results is typically performed interactively, and due to the fact that most supercomputing centers do not offer dedicated interactive nodes, the transfer of simulation output data to local resources becomes necessary. Even if interactive resources were available, typical network latencies make X-forwarded displays nearly impossible to work with. Since data sets can be quite large and traditional transfer mechanisms such as scp and sftp offer relatively low throughput, this transfer of data sets becomes a bottleneck in the research workflow. In this article we measure the scalability of the Computational HYdronamics with MultiplE Radiation Algorithms (CHYMERA) code on the SGI Altix architecture. We find that it scales well up to 64 threads for moderate and large sized problems. We also present a novel approach to enable rapid transfer and analysis of simulation data via the Data Capacitor (DC) and Lustre WAN (Wide Area Network) [17]. The usage of a WAN file system to tie batch system operated compute resources and interactive analysis and visualization resources together is of general interest and can be applied broadly.