Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint

Authors:
Fei Cao;Michelle M. Zhu
Affiliations:
Department of Computer Science, Southern Illinois University, Carbondale, USA 62901;Department of Computer Science, Southern Illinois University, Carbondale, USA 62901
Venue:
The Journal of Supercomputing
Year:
2013

Citing 22
Cited 0

Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures

IEEE Transactions on Parallel and Distributed Systems
Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Task Scheduling Algorithms for Heterogeneous Processors

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Reliable Matching and Scheduling of Precedence-Constrained Tasks in Heterogeneous Distributed Computing

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Triplet: A Clustering Scheduling Algorithm for Heterogeneous Systems

ICPPW '01 Proceedings of the 2001 International Conference on Parallel Processing Workshops
Adaptive Visualization Pipeline Decomposition and Mapping onto Computer Networks

ICIG '04 Proceedings of the Third International Conference on Image and Graphics
Biobjective Scheduling Algorithms for Execution Time–Reliability Trade-off in Heterogeneous Computing Systems*

The Computer Journal
Critical-Path and Priority based Algorithms for Scheduling Workflows with Parameter Sweep Tasks on Global Grids

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Reliability and Scheduling on Systems Subject to Failures

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Performance evaluation of virtual machine-based Grid workflow system

Concurrency and Computation: Practice & Experience - 2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007)
Supporting Distributed Application Workflows in Heterogeneous Computing Environments

ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
Reliability in grid computing systems

Concurrency and Computation: Practice & Experience - A Special Issue from the Open Grid Forum
An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Performance analysis of dynamic workflow scheduling in multicluster grids

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A Distributed Workflow Mapping Algorithm for Minimum End-to-End Delay under Fault-Tolerance Constraint

ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms

Software—Practice & Experience
Virtual workflow system for distributed collaborative scientific applications on Grids

Computers and Electrical Engineering
Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm

Future Generation Computer Systems
Latency modeling and minimization for large-scale scientific workflows in distributed network environments

Proceedings of the 44th Annual Simulation Symposium
Developing an End-to-End Scientific Workflow

Computing in Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A distributed scientific workflow mapping algorithm for maximized reliability under certain end-to-end delay (EED) bound is proposed. It is studied in a heterogeneous distributed computing environment, where computing node and communication link failures are inevitable. The mapping decision and the stored table information is distributed among various nodes in order to achieve scalability and robustness, which are especially important for large-scale distributed systems. This Distributed Reliability Maximization workflow mapping algorithm under End-to-end Delay constraint (dis-DRMED) considers both the maximum reliability and the minimum EED objectives in a two-step procedure. In the first step, a mapping algorithm combining iterative Critical Path search and Layer-based priority assigning techniques (CPL) is adopted to minimize the EED by focusing on the optimal allocation of tasks on the critical path. In the second step, tasks on noncritical paths are remapped to improve the overall execution reliability. Simulation results under various system setups demonstrated that dis-DRMED achieved considerably higher reliability values under the same EED constraint compared with some representative workflow mapping algorithms.