Condor-G: A Computation Management Agent for Multi-Institutional Grids
Cluster Computing
The AppLeS Parameter Sweep Template: User-level middleware for the Grid\m{1}
Scientific Programming
A computational infrastructure for grid-based asynchronous parallel applications
Proceedings of the 16th international symposium on High performance distributed computing
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Application Level Interoperability between Clouds and Grids
GPC '09 Proceedings of the 2009 Workshops at the Grid and Pervasive Computing Conference
GMAC '09 Proceedings of the 6th international conference industry session on Grids meets autonomic computing
IEEE Internet Computing
E-SCIENCE '09 Proceedings of the 2009 Fifth IEEE International Conference on e-Science
A QoS assurance framework for distributed infrastructures
Proceedings of the 3rd International Workshop on Monitoring, Adaptation and Beyond
Proceedings of the second international workshop on Emerging computational methods for the life sciences
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Understanding scheduling implications for scientific applications in clouds
Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science
Towards a common model for pilot-jobs
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Running many molecular dynamics simulations on many supercomputers
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Making campus bridging work for researchers: a case study with mlRho
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Scalable online comparative genomics of mononucleosomes: a BigJob
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
A framework for flexible and scalable replica-exchange on production distributed CI
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
EventWave: programming model and runtime support for tightly-coupled elastic cloud applications
Proceedings of the 4th annual Symposium on Cloud Computing
Stretch optimization for virtual screening on multi-user pilot-agent platforms on grid/cloud
Proceedings of the Fourth Symposium on Information and Communication Technology
JETS: Language and System Support for Many-Parallel-Task Workflows
Journal of Grid Computing
Hi-index | 0.00 |
The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels -- development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment, this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem, specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.