A polynomial algorithm for the k-cut problem for fixed k
Mathematics of Operations Research
The Complexity of Multiterminal Cuts
SIAM Journal on Computing
Journal of the ACM (JACM)
Rounding algorithms for a geometric embedding of minimum multiway cut
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
An improved approximation algorithm for MULTIWAY CUT
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A framework for reliable and efficient data placement in distributed computing systems
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Data Sharing Pattern Aware Scheduling on Grids
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Data Management Challenges of Data-Intensive Scientific Workflows
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
File grouping for scientific data management: lessons from experimenting with real traces
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
BitDew: a programmable environment for large-scale data management and distribution
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scientific Cloud Computing: Early Definition and Experience
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Towards a general model of the multi-criteria workflow scheduling on the grid
Future Generation Computer Systems
On the Use of Cloud Computing for Scientific Workflows
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
The Eucalyptus Open-Source Cloud-Computing System
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Concurrency and Computation: Practice & Experience - Special Issue: 3rd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2008)
A data placement strategy in scientific cloud workflows
Future Generation Computer Systems
Performance analysis of dynamic workflow scheduling in multicluster grids
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Enforcing SLAs in Scientific Clouds
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
IEEE Transactions on Services Computing
Hi-index | 0.00 |
Recently, cloud computing has emerged as a promising computing infrastructure for performing scientific workflows by providing on-demand resources. Meanwhile, it is convenient for scientific collaboration since different cloud environments used by the researchers are connected through Internet. However, the significant latency arising from frequent access to large datasets and the corresponding data movements across geo-distributed data centers has been an obstacle to hinder the efficient execution of data-intensive scientific workflows. In this paper, we propose a novel graph-cut based data and task co scheduling strategy for minimizing the data transfer across geo-distributed data centers. Specifically, a dependency graph is firstly constructed from workflow provenance and cut into sub graphs according to the datasets which must appear in fixed data centers by a multiway cut algorithm. Then, the sub graphs might be recursively cut into smaller ones by a minimum cut algorithm referring to data correlation rules until all of them can well fit the capacity constraints of the data centers where the fixed location datasets reside. In this way, the datasets and tasks are distributed into target data centers while the total amount of data transfer between them is minimized. Additionally, a runtime scheduling algorithm is exploited to dynamically adjust the data placement during execution to prevent the data centers from overloading. Simulation results demonstrate that the total volume of data transfer across different data centers can be significantly reduced and the cost of performing scientific workflows on the clouds will be accordingly saved.