Reliability-aware scheduling strategy for heterogeneous distributed computing systems

Authors:
Xiaoyong Tang;Kenli Li;Renfa Li;Bharadwaj Veeravalli
Affiliations:
School of Computer and Communication, Hunan University, Changsha, 410082, China;School of Computer and Communication, Hunan University, Changsha, 410082, China;School of Computer and Communication, Hunan University, Changsha, 410082, China;Department of Electrical and Computer Engineering, The National University of Singapore, 117576, Singapore
Venue:
Journal of Parallel and Distributed Computing
Year:
2010

Citing 37
Cited 5

Scheduling parallel program tasks onto arbitrary target machines

Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Towards an architecture-independent analysis of parallel algorithms

SIAM Journal on Computing
Task Allocation for Maximizing Reliability of Distributed Computer Systems

IEEE Transactions on Computers
Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A two-pass scheduling algorithm for parallel programs

Parallel Computing
Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems

IEEE Transactions on Computers
Optimal Scheduling Algorithm for Distributed-Memory Machines

IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis

IEEE Transactions on Parallel and Distributed Systems
On Exploiting Task Duplication in Parallel Program Scheduling

IEEE Transactions on Parallel and Distributed Systems
A comparison of list schedules for parallel processing systems

Communications of the ACM
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Low-Cost Task Scheduling for Distributed-Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Hypertool: A Programming Aid for Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Fast Allocation of Processes in Distributed and Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures

IEEE Transactions on Parallel and Distributed Systems
An integrated technique for task matching and scheduling onto distributed heterogeneous computing systems

Journal of Parallel and Distributed Computing - Problems in parallel and distributed computing: Solutions based on evolutionary paradigms
Generation of Fault-Tolerant Static Scheduling for Real-Time Distributed Embedded Systems with Multi-Point Links

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Optimal and Suboptimal Reliable Scheduling of Precedence-Constrained Tasks in Heterogeneous Distributed Computing

ICPP '00 Proceedings of the 2000 International Workshop on Parallel Processing
An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Harmonic Proportional Bandwidth Allocation and Scheduling for Service Differentiation on Streaming Servers

IEEE Transactions on Parallel and Distributed Systems
Dealing with heterogeneity through limited duplication for scheduling precedence constrained task graphs

Journal of Parallel and Distributed Computing
Iterative list scheduling for heterogeneous computing

Journal of Parallel and Distributed Computing
A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters

Journal of Parallel and Distributed Computing
Toward a Realistic Task Scheduling Model

IEEE Transactions on Parallel and Distributed Systems
A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems

Parallel Computing - Heterogeneous computing
A Task Allocation Model for Distributed Computing Systems

IEEE Transactions on Computers
Multiprocessor Scheduling with the Aid of Network Flow Algorithms

IEEE Transactions on Software Engineering
Task Allocation in Distributed Data Processing

Computer
Research Note: A high performance algorithm for static task scheduling in heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing
Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems

Parallel Computing
On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices

Journal of Parallel and Distributed Computing
A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems

Parallel Computing
List scheduling with duplication for heterogeneous computing systems

Journal of Parallel and Distributed Computing

A stochastic scheduling algorithm for precedence constrained tasks on Grid

Future Generation Computer Systems
A hierarchical reliability-driven scheduling algorithm in grid systems

Journal of Parallel and Distributed Computing
Energy- and reliability-aware task scheduling onto heterogeneous MPSoC architectures

The Journal of Supercomputing
Reliable workflow scheduling with less resource redundancy

Parallel Computing
HSGA: a hybrid heuristic algorithm for workflow scheduling in cloud systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous computing systems are promising computing platforms, since single parallel architecture based systems may not be sufficient to exploit the available parallelism with the running applications. In some cases, heterogeneous distributed computing (HDC) systems can achieve higher performance with lower cost than single-machine supersystems. However, in HDC systems, processors and networks are not failure free and any kind of failure may be critical to the running applications. One way of dealing with such failures is to employ a reliable scheduling algorithm. Unfortunately, most existing scheduling algorithms for precedence constrained tasks in HDC systems do not adequately consider reliability requirements of inter-dependent tasks. In this paper, we design a reliability-driven scheduling architecture that can effectively measure system reliability, based on an optimal reliability communication path search algorithm, and then we introduce reliability priority rank (RRank) to estimate the task's priority by considering reliability overheads. Furthermore, based on directed acyclic graph (DAG) we propose a reliability-aware scheduling algorithm for precedence constrained tasks, which can achieve high quality of reliability for applications. The comparison studies, based on both randomly generated graphs and the graphs of some real applications, show that our scheduling algorithm outperforms the existing scheduling algorithms in terms of makespan, scheduling length ratio, and reliability. At the same time, the improvement gained by our algorithm increases as the data communication among tasks increases.