Optimizing workflow data footprint
Scientific Programming - Dynamic Computational Workflows: Discovery, Optimization and Scheduling
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Model-based simulation and performance evaluation of grid scheduling strategies
Future Generation Computer Systems
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
P2P file sharing for P2P computing
Multiagent and Grid Systems - Content management and delivery through P2P-based content networks
Data Staging Strategies and Their Impact on the Execution of Scientific Workflows
Proceedings of the second international workshop on Data-aware distributed computing
DAGMap: Efficient scheduling for DAG grid workflow job
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Access-pattern and bandwidth aware file replication algorithm in a grid environment
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
An integrated resource management and scheduling system for grid data streaming applications
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Scheduling data-intensive workflows on storage constrained resources
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Matchmaking scientific workflows in grid environments
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Integrating Cloud-Computing-Specific Model into Aircraft Design
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
DAGMap: efficient and dependable scheduling of DAG workflow job in Grid
The Journal of Supercomputing
Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments
International Journal of High Performance Computing Applications
Performance analysis of dynamic workflow scheduling in multicluster grids
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Integrated data placement and task assignment for scientific workflows in clouds
Proceedings of the fourth international workshop on Data-intensive distributed computing
Multiple Workflow Scheduling Strategies with User Run Time Estimates on a Grid
Journal of Grid Computing
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Model and complexity results for tree traversals on hybrid platforms
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
The Journal of Supercomputing
Future Generation Computer Systems
Hi-index | 0.00 |
In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific workflows onto distributed resources where the workflows are dataintensive, requiring large amounts of data storage, and where the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer required and we schedule the workflows in a way that assures that the amount of data required and generated by the workflow fits onto the individual resources. For a workflow used by gravitationalwave physicists, we were able to improve the amount of storage required by the workflow by up to 57 %. We also designed an algorithm that can not only find feasible solutions for workflow task assignment to resources in diskspace constrained environments, but can also improve the overall workflow performance.