Efficient Runtime Environment for Coupled Multi-physics Simulations: Dynamic Resource Allocation and Load-Balancing

Authors:
Soon-Heum Ko;Nayong Kim;Joohyun Kim;Abhinav Thota;Shantenu Jha
Affiliations:
-;-;-;-;-
Venue:
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Year:
2010

Citing 1
Cited 8

Numerical simulation of 3D fluid-structure interaction flow using an immersed object method with overlapping grids

Computers and Structures

Exploring the RNA folding energy landscape using scalable distributed cyberinfrastructure

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Building gateways for life-science applications using the dynamic application runtime environment (DARE) framework

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Adaptive Executions of Multi-Physics Coupled Applications on Batch Grids

Journal of Grid Computing
Malleable Model Coupling with Prediction

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Running many molecular dynamics simulations on many supercomputers

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
The anatomy of successful ECSS projects: lessons of supporting high-throughput high-performance ensembles on XSEDE

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Large improvements in application throughput of long-running multi-component applications using batch grids

Concurrency and Computation: Practice & Experience
Distributed Application Runtime Environment (DARE): A Standards-based Middleware Framework for Science-Gateways

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coupled Multi-Physics simulations, such as hybrid CFD-MD simulations, represent an increasingly important class of scientific applications. Often the physical problems of interest demand the use of high-end computers, such as TeraGrid resources, which are often accessible only via batch-queues. Batch-queue systems are not developed to natively support the coordinated scheduling of jobs – which in turn is required to support the concurrent execution required by coupled multi-physics simulations. In this paper we develop and demonstrate a novel approach to overcome the lack of native support for coordinated job submission requirement associated with coupled runs. We establish the performance advantages arising from our solution, which is a generalization of the Pilot-Job concept – which in of itself is not new, but is being applied to coupled simulations for the first time. Our solution not only overcomes the initial co-scheduling problem, but also provides a dynamic resource allocation mechanism. Support for such dynamic resources is critical for a load balancing mechanism, which we develop and demonstrate to be effective at reducing the total time-to-solution of the problem. We establish that the performance advantage of using Big Jobs is invariant with the size of the machine as well as the size of the physical model under investigation. The Pilot-Job abstraction is developed using SAGA, which provides an infrastructure agnostic implementation, and which can seamlessly execute and utilize distributed resources.