Dynamic Virtual Clusters in a Grid Site Manager
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
A Case For Grid Computing On Virtual Machines
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Virtual Clusters for Grid Communities
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Xen-Based HPC: A Parallel I/O Perspective
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
The XtreemFS architecture—a case for object-based file systems in Grids
Concurrency and Computation: Practice & Experience - Selection of Best Papers of the VLDB Data Management in Grids Workshop (VLDB DMG 2007)
Contextualization: Providing One-Click Virtual Clusters
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
On the Use of Cloud Computing for Scientific Workflows
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
High-Performance Cloud Computing: A View of Scientific Applications
ISPAN '09 Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
Paravirtualization for HPC systems
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Integrated data placement and task assignment for scientific workflows in clouds
Proceedings of the fourth international workshop on Data-intensive distributed computing
Experiences using cloud computing for a scientific workflow application
Proceedings of the 2nd international workshop on Scientific cloud computing
State of the Practice Reports
One optimized I/O configuration per HPC application: leveraging the configurability of cloud
Proceedings of the Second Asia-Pacific Workshop on Systems
A performance evaluation of Azure and Nimbus clouds for scientific applications
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Enabling data and compute intensive workflows in bioinformatics
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
TomusBlobs: Towards Communication-Efficient Storage for MapReduce Applications in Azure
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Performance analysis of HPC applications in the cloud
Future Generation Computer Systems
Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing and Deploying a Scientific Computing Cloud Platform
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
VIDAS: object-based virtualized data sharing for high performance storage I/O
Proceedings of the 4th ACM workshop on Scientific cloud computing
Failure analysis of distributed scientific workflows executing in the cloud
Proceedings of the 8th International Conference on Network and Service Management
ACIC: automatic cloud I/O configurator for HPC applications
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable script-based data analysis workflows on clouds
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Hi-index | 0.00 |
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon's EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.