Graph-Cut Based Coscheduling Strategy Towards Efficient Execution of Scientific Workflows in Collaborative Cloud Environments

Authors:
Kefeng Deng;Junqiang Song;Kaijun Ren;Dong Yuan;Jinjun Chen
Affiliations:
-;-;-;-;-
Venue:
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Year:
2011

Citing 23
Cited 0

A polynomial algorithm for the k-cut problem for fixed k

Mathematics of Operations Research
The Complexity of Multiterminal Cuts

SIAM Journal on Computing
A simple min-cut algorithm

Journal of the ACM (JACM)
Rounding algorithms for a geometric embedding of minimum multiway cut

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
An improved approximation algorithm for MULTIWAY CUT

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A framework for reliable and efficient data placement in distributed computing systems

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Data Sharing Pattern Aware Scheduling on Grids

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Data Management Challenges of Data-Intensive Scientific Workflows

CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
File grouping for scientific data management: lessons from experimenting with real traces

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
BitDew: a programmable environment for large-scale data management and distribution

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The cost of doing science on the cloud: the Montage example

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scientific Cloud Computing: Early Definition and Experience

HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Towards a general model of the multi-criteria workflow scheduling on the grid

Future Generation Computer Systems
On the Use of Cloud Computing for Scientific Workflows

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Scientific workflow scheduling in computational grids Planning, reservation, and data/network-awareness

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
The Eucalyptus Open-Source Cloud-Computing System

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Building Quick Service Query list (QSQL) to support automated service discovery for scientific workflow

Concurrency and Computation: Practice & Experience - Special Issue: 3rd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2008)
A data placement strategy in scientific cloud workflows

Future Generation Computer Systems
Performance analysis of dynamic workflow scheduling in multicluster grids

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Enforcing SLAs in Scientific Clouds

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Building Quick Service Query List Using WordNet and Multiple Heterogeneous Ontologies toward More Realistic Service Composition

IEEE Transactions on Services Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, cloud computing has emerged as a promising computing infrastructure for performing scientific workflows by providing on-demand resources. Meanwhile, it is convenient for scientific collaboration since different cloud environments used by the researchers are connected through Internet. However, the significant latency arising from frequent access to large datasets and the corresponding data movements across geo-distributed data centers has been an obstacle to hinder the efficient execution of data-intensive scientific workflows. In this paper, we propose a novel graph-cut based data and task co scheduling strategy for minimizing the data transfer across geo-distributed data centers. Specifically, a dependency graph is firstly constructed from workflow provenance and cut into sub graphs according to the datasets which must appear in fixed data centers by a multiway cut algorithm. Then, the sub graphs might be recursively cut into smaller ones by a minimum cut algorithm referring to data correlation rules until all of them can well fit the capacity constraints of the data centers where the fixed location datasets reside. In this way, the datasets and tasks are distributed into target data centers while the total amount of data transfer between them is minimized. Additionally, a runtime scheduling algorithm is exploited to dynamically adjust the data placement during execution to prevent the data centers from overloading. Simulation results demonstrate that the total volume of data transfer across different data centers can be significantly reduced and the cost of performing scientific workflows on the clouds will be accordingly saved.