Reliable workflow scheduling with less resource redundancy

Authors:
Laiping Zhao;Yizhi Ren;Kouichi Sakurai
Affiliations:
-;-;-
Venue:
Parallel Computing
Year:
2013

Citing 30
Cited 0

Tutorial: hard real-time systems

Tutorial: hard real-time systems
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
A Reliability-Aware Value-Based Scheduler for Dynamic Multiprocessor Real-Time Systems

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Biobjective Scheduling Algorithms for Execution Time–Reliability Trade-off in Heterogeneous Computing Systems*

The Computer Journal
A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters

Journal of Parallel and Distributed Computing
Cost-Based Scheduling of Scientific Workflow Application on Utility Grids

E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
A Task Duplication Based Optimal Scheduling Algorithm for Variable Execution Time Tasks

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
A scalable, commodity data center network architecture

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems

Parallel Computing
On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices

Journal of Parallel and Distributed Computing
On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs

IEEE Transactions on Computers
Reliability in grid computing systems

Concurrency and Computation: Practice & Experience - A Special Issue from the Open Grid Forum
Dynamic Job Scheduling on Heterogeneous Clusters

ISPDC '09 Proceedings of the 2009 Eighth International Symposium on Parallel and Distributed Computing
A survey of online failure prediction methods

ACM Computing Surveys (CSUR)
A Novel Bicriteria Scheduling Heuristics Providing a Guaranteed Global System Failure Rate

IEEE Transactions on Dependable and Secure Computing
Scheduling workflow applications on processors with different capabilities

Future Generation Computer Systems - Collaborative and learning applications of grid technology
DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Reliability-aware scheduling strategy for heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing
Failure-aware workflow scheduling in cluster environments

Cluster Computing
Reliability and Performance Optimization of Pipelined Real-Time Systems

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
A survey of hard real-time scheduling for multiprocessor systems

ACM Computing Surveys (CSUR)
A Resource Minimizing Scheduling Algorithm with Ensuring the Deadline and Reliability in Heterogeneous Systems

AINA '11 Proceedings of the 2011 IEEE International Conference on Advanced Information Networking and Applications
Dynamic scheduling of a batch of parallel task jobs on heterogeneous clusters

Parallel Computing
On Task Allocation and Scheduling for Lifetime Extension of Platform-Based MPSoC Designs

IEEE Transactions on Parallel and Distributed Systems
Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms

Journal of Scheduling

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine the problem of reliable workflow scheduling with less resource redundancy. As scheduling workflow applications in heterogeneous systems, either for optimizing the reliability or for minimizing the makespan, are NP-Complete problems, we alternatively find schedules for meeting specific reliability and deadline requirements. First, we analyze the reliability of a given schedule using two important definitions: Accumulated Processor Reliability (APR) and Accumulated Communication Reliability (ACR). Second, inspired by the reliability analysis, we present three scheduling algorithms: RR algorithm schedules least Resources to meet the Reliability requirement; DRR algorithm extends RR by further considering the Deadline requirement; and dynamic algorithm schedules tasks dynamically: It avoids the ''Chain effect'' caused by uncertainties on the task execution time estimates, and relieves the impact from the inaccuracy on failure estimation. Finally, the empirical evaluation shows that our algorithms can save a significant amount of computation and communication resources when performing a similar reliability compared to Fault-Tolerant-Scheduling-Algorithm (FTSA) algorithm.