A Multi-objective Approach for Workflow Scheduling in Heterogeneous Environments
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Multi-objective list scheduling of workflow applications in distributed computing infrastructures
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Latency, fault tolerance and reliability are important requirements for several applications that are time critical in nature: such applications require guarantees in terms of latency, even when processors are subject to failures. In this paper, we propose a fault-tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting ε arbitrary fail-silent/fail-stop processor failures, and hence valid results will be provided even if ε processors fail. First we focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Next we derive a more complex algorithm in which we not only minimize latency and support a fixed number of failures, but also improve the overall reliability. Major achievements include low complexity of the new algorithms, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the fault-tolerance based active replication scheduling algorithm FTBAR.