Data driven workflow planning in cluster management systems

Authors:
Srinath Shankar;David J. DeWitt
Affiliations:
University of Wisconsin;University of Wisconsin
Venue:
Proceedings of the 16th international symposium on High performance distributed computing
Year:
2007

Citing 17
Cited 6

Scheduling precedence graphs in systems with interprocessor communication times

SIAM Journal on Computing
Zoo: a desktop experiment management environment

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Benchmarking and comparison of the task graph scheduling algorithms

Journal of Parallel and Distributed Computing
LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies

IEEE Transactions on Computers
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Giggle: a framework for constructing scalable replica location services

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Pipeline and Batch Sharing in Grid Workloads

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Farsite: federated, available, and reliable storage for an incompletely trusted environment

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Stork: Making Data Placement a First Class Citizen in the Grid

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Scheduling of scientific workflows in the ASKALON grid environment

ACM SIGMOD Record
Efficient scheduling and execution of scientific workflow tasks

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Co-scheduling of computation and data on computer clusters

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Task scheduling strategies for workflow-based applications in grids

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Explicit control a batch-aware distributed file system

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
GridDB: a data-centric overlay for scientific grids

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

DIMM: a distributed metadata management for data-intensive HPC environments

DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Clustera: an integrated computation and data management system

Proceedings of the VLDB Endowment
File Clustering Based Replication Algorithm in a Grid Environment

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Data Staging Strategies and Their Impact on the Execution of Scientific Workflows

Proceedings of the second international workshop on Data-aware distributed computing
Access-pattern and bandwidth aware file replication algorithm in a grid environment

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Schedule optimization for data processing flows on the cloud

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. This is especially true in the fields of astronomy and high energy physics. Furthermore, the lowered cost of disks and commodity machines has led to a dramatic increase in the amount of free disk space spread across machines in a cluster. This space is not being exploited by traditional distributed computing tools. In this paper we have evaluated ways to improve the data management capabilities of Condor, a popular distributed computing system. We have augmented the Condor system by providing the capability to store data used and produced by workflows on the disks of machines in the cluster. We have also replaced the Condor matchmaker with a new workflow planning framework that is cognizant of dependencies between jobs in a workflow and exploits these new data storage capabilities to produce workflow schedules. We show that our data caching and workflow planning framework can significantly reduce response times for data-intensive workflows by reducing data transfer over the network in a cluster. We also consider ways in which this planning framework can be made adaptive in a dynamic, multi-user, failure-prone environment.