Workflow task clustering for best effort systems with Pegasus
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Workflow overhead analysis and optimizations
Proceedings of the 6th workshop on Workflows in support of large-scale science
Integration of Workflow Partitioning and Resource Provisioning
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.00 |
Scientific workflows are a means of defining and orchestrating large, complex, multi-stage computations that perform data analysis and/or simulation. Task clustering is a runtime optimization technique that merges multiple short workflow tasks into a single job such that the job execution overhead is reduced and the overall runtime performance of the workflow is significantly improved. However, current task clustering strategies fail to consider the imbalance problem of both task runtime and task dependency. In our work, we first investigate the different causes of runtime imbalance and dependency imbalance. We then introduce a series of metrics based on our prior work to measure the severity of runtime and dependency imbalance respectively. Finally, we study a wide range of real scientific workflows to generalize the relationship between these metrics and balancing methods.