Optimizing performance and reliability on heterogeneous parallel systems: Approximation algorithms and heuristics

Authors:
Emmanuel Jeannot;Erik Saule;Denis Trystram
Affiliations:
INRIA Bordeaux Sud-Ouest, Talence, France;BMI, Ohio State University, Columbus 43210, OH, USA;Grenoble Institute of Technology, Grenoble, France and Institut Universitaire de France, France
Venue:
Journal of Parallel and Distributed Computing
Year:
2012

Citing 23
Cited 0

Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach

SIAM Journal on Computing
Task Allocation for Maximizing Reliability of Distributed Computer Systems

IEEE Transactions on Computers
Scheduling unrelated machines with costs

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
An efficient approximation algorithm for minimizing makespan on uniformly related machines

Journal of Algorithms
Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Introduction to Algorithms

Introduction to Algorithms
A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures

IEEE Transactions on Parallel and Distributed Systems
A Static Scheduling Heuristic for Heterogeneous Processors

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
On the approximability of trade-offs and optimal access of Web sources

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
A Dynamic Matching and Scheduling Algorithm for Heterogeneous Computing Systems

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Task Scheduling Algorithms for Heterogeneous Processors

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Handbook of Scheduling: Algorithms, Models, and Performance Analysis

Handbook of Scheduling: Algorithms, Models, and Performance Analysis
Biobjective Scheduling Algorithms for Execution Time–Reliability Trade-off in Heterogeneous Computing Systems*

The Computer Journal
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Bi-objective Approximation Scheme for Makespan and Reliability Optimization on Uniform Parallel Machines

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Scheduling: Theory, Algorithms, and Systems

Scheduling: Theory, Algorithms, and Systems
Reliability versus performance for critical applications

Journal of Parallel and Distributed Computing
Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments

IEEE Transactions on Parallel and Distributed Systems
An efficient weighted bi-objective scheduling algorithm for heterogeneous systems

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Static worksharing strategies for heterogeneous computers with unrecoverable interruptions

Parallel Computing
Performance assessment of multiobjective optimizers: an analysis and review

IEEE Transactions on Evolutionary Computation
Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms

Journal of Scheduling

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of scheduling tasks (with and without precedence constraints) on a set of related processors which have a probability of failure governed by an exponential law. The goal is to design approximation algorithms or heuristics that optimize both makespan and reliability. First, we show that both objectives are contradictory and that the number of points of the Pareto-front can be exponential. This means that this problem cannot be approximated by a single schedule. Second, for independent unitary tasks, we provide an optimal scheduling algorithm where the objective is to maximize the reliability subject to makespan minimization. For the bi-objective optimization, we provide a (1+@e,1)-approximation algorithm of the Pareto-front. Next, for independent arbitrary tasks, we propose a -approximation algorithm (i.e. for any fixed value of the makespan, the obtained solution is optimal on the reliability and no more than twice the given makespan) that has a much lower complexity than the other existing algorithms. This solution is used to derive a (2+@e,1)-approximation of the Pareto-front of the problem. All these proposed solutions are discriminated by the value of the product {failure rate} x {unitary instruction execution time} of each processor, which appears to be a crucial parameter in the context of bi-objective optimization. Based on this observation, we provide a general method for converting scheduling heuristics on heterogeneous clusters into heuristics that take into account the reliability when there are precedence constraints. The average behavior is studied by extensive simulations. Finally, we discuss the specific case of scheduling a chain of tasks which leads to optimal results.