A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Condor-G: A Computation Management Agent for Multi-Institutional Grids
Cluster Computing
Predicting Queue Times on Space-Sharing Parallel Computers
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Condor-G: A Computation Management Agent for Multi-Institutional Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Parallel scheduling of complex dags under uncertainty
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Service-Oriented Environments for Dynamically Interacting with Mesoscale Weather
Computing in Science and Engineering
Predicting bounds on queuing delay for batch-scheduled parallel machines
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Reliability challenges in large systems
Future Generation Computer Systems
Scalable Grid Application Scheduling via Decoupled Resource Selection and Scheduling
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Task scheduling strategies for workflow-based applications in grids
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Improving grid resource allocation via integrated selection and binding
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Toward a doctrine of containment: grid hosting with adaptive resource control
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Workflows for e-Science: Scientific Workflows for Grids
Workflows for e-Science: Scientific Workflows for Grids
Grid Resource Abstraction, Virtualization, and Provisioning for Time-Targeted Applications
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Scientific Programming - Scientific Workflows
Performability modeling for scheduling and fault tolerance strategies for scientific workflows
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
VARQ: virtual advance reservations for queues
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Combining batch execution and leasing using virtual machines
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Amazon S3 for science grids: a viable solution?
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Agility in virtualized utility computing
VTDC '07 Proceedings of the 2nd international workshop on Virtualization technology in distributed computing
Using clouds to address grid limitations
Proceedings of the 6th international workshop on Middleware for grid computing
The Eucalyptus Open-Source Cloud-Computing System
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
QBETS: queue bounds estimation from time series
JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
On the impact of reservations from the grid on planning-based resource management
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part III
WORKEM: Representing and Emulating Distributed Scientific Workflow Execution State
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Virtual Resources Allocation for Workflow-Based Applications Distribution on a Cloud Infrastructure
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Comparison of resource platform selection approaches for scientific workflows
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Power-Aware Consolidation of Scientific Workflows in Virtualized Environments
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hybrid Computing-Where HPC meets grid and Cloud Computing
Future Generation Computer Systems
Joint Elastic Cloud and Virtual Network Framework for Application Performance-cost Optimization
Journal of Grid Computing
Magellan: experiences from a science cloud
Proceedings of the 2nd international workshop on Scientific cloud computing
Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids
Journal of Grid Computing
Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds
Future Generation Computer Systems
Concurrency and Computation: Practice & Experience
Impact of variable priced cloud resources on scientific workflow scheduling
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Consolidated cluster systems for data centers in the cloud age: a survey and analysis
Frontiers of Computer Science: Selected Publications from Chinese Universities
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
Today's scientific workflows use distributed heterogeneous resources through diverse grid and cloud interfaces that are often hard to program. In addition, especially for time-sensitive critical applications, predictable quality of service is necessary across these distributed resources. VGrADS' virtual grid execution system (vgES) provides an uniform qualitative resource abstraction over grid and cloud systems. We apply vgES for scheduling a set of deadline sensitive weather forecasting workflows. Specifically, this paper reports on our experiences with (1) virtualized reservations for batchqueue systems, (2) coordinated usage of TeraGrid (batch queue), Amazon EC2 (cloud), our own clusters (batch queue) and Eucalyptus (cloud) resources, and (3) fault tolerance through automated task replication. The combined effect of these techniques was to enable a new workflow planning method to balance performance, reliability and cost considerations. The results point toward improved resource selection and execution management support for a variety of e-Science applications over grids and cloud systems.