The Kangaroo Approach to Data Movement on the Grid
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
A framework for reliable and efficient data placement in distributed computing systems
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Data driven workflow planning in cluster management systems
Proceedings of the 16th international symposium on High performance distributed computing
COMP Superscalar: Bringing GRID Superscalar and GCM Together
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Scientific workflow design for mere mortals
Future Generation Computer Systems
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Scheduling data-intensive workflows on storage constrained resources
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Performance analysis of dynamic workflow scheduling in multicluster grids
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A classification of file placement and replication methods on grids
Future Generation Computer Systems
Hi-index | 0.00 |
Data intensive workflows process and generate large amounts of data. Strategies employed to stage data in and out of compute resources can often have a significant impact on the overall execution of a workflow. We study the relationships between data placement services that perform the staging and workflow managers that control the release of computational jobs. We describe a framework that classifies data staging strategies into decoupled, loosely-coupled and tightly-coupled modes, based on the degree of their interaction with the workflow manager. We present the results of simulation studies that investigate the effect of decoupled, loosely-coupled and tightly-coupled data staging strategies on synthetic workflows resembling those from real world scientific applications.