Dynamic Virtual Clusters in a Grid Site Manager
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
A Case For Grid Computing On Virtual Machines
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Virtual Clusters for Grid Communities
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A case for high performance computing with virtual machines
Proceedings of the 20th annual international conference on Supercomputing
Workflows for e-Science: Scientific Workflows for Grids
Workflows for e-Science: Scientific Workflows for Grids
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Xen-Based HPC: A Parallel I/O Perspective
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Amazon S3 for science grids: a viable solution?
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
The XtreemFS architecture—a case for object-based file systems in Grids
Concurrency and Computation: Practice & Experience - Selection of Best Papers of the VLDB Data Management in Grids Workshop (VLDB DMG 2007)
Can cloud computing reach the top500?
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
High-Performance Cloud Computing: A View of Scientific Applications
ISPAN '09 Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
Scaling up workflow-based applications
Journal of Computer and System Sciences
Bioinformatics
Paravirtualization for HPC systems
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Automating Application Deployment in Infrastructure Clouds
CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
A broker-based framework for multi-cloud workflows
Proceedings of the 2013 international workshop on Multi-cloud applications and federated clouds
Dimensioning the virtual cluster for parallel scientific workflows in clouds
Proceedings of the 4th ACM workshop on Scientific cloud computing
Evaluating I/O aware network management for scientific workflows on networked clouds
NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Budget-Deadline Constrained Workflow Planning for Admission Control
Journal of Grid Computing
Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform
Journal of Grid Computing
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Hi-index | 0.00 |
Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon's EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.