Reliability versus performance for critical applications

Authors:
Alain Girault;írik Saule;Denis Trystram
Affiliations:
INRIA, Grenoble Rhône-Alpes, POP ART team, France;INPG - LIG, Grenoble, France;INPG - LIG, Grenoble, France
Venue:
Journal of Parallel and Distributed Computing
Year:
2009

Citing 25
Cited 11

Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Task Allocation for Maximizing Reliability of Distributed Computer Systems

IEEE Transactions on Computers
Approximation algorithms for scheduling

Approximation algorithms for NP-hard problems
Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems

IEEE Transactions on Computers
Reliable computer systems (3rd ed.): design and evaluation

Reliable computer systems (3rd ed.): design and evaluation
Safety and Reliability Driven Task Allocation in Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors

Journal of the ACM (JACM)
Guest Editorial: A Review of Worst-Case Execution-TimeAnalysis

Real-Time Systems - Special issue on worst-case execution-time analysis
Fast and Precise WCET Prediction by Separated Cache andPath Analyses

Real-Time Systems - Special issue on worst-case execution-time analysis
Worst Case Execution Time Analysis for a Processor withBranch Prediction

Real-Time Systems - Special issue on worst-case execution-time analysis
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures

IEEE Transactions on Parallel and Distributed Systems
Reliable and Precise WCET Determination for a Real-Life Processor

EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
On the approximability of trade-offs and optimal access of Web sources

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
An Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks with Precedence Constraints in Heterogeneous Systems

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Benchmarking the Task Graph Scheduling Algorithms

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Bi-Criteria Scheduling Heuristic for Distributed Embedded Systems under Reliability and Real-Time Constraints

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Scheduling Problems with Two Competing Agents

Operations Research
Approximation results for a bicriteria job scheduling problem on a single machine without preemption

Information Processing Letters
Biobjective Scheduling Algorithms for Execution Time–Reliability Trade-off in Heterogeneous Computing Systems*

The Computer Journal
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered embedded systems

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A Novel Bicriteria Scheduling Heuristics Providing a Guaranteed Global System Failure Rate

IEEE Transactions on Dependable and Secure Computing
Characterization of Pareto dominance

Operations Research Letters

An efficient weighted bi-objective scheduling algorithm for heterogeneous systems

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Reliable parallel programming model for distributed computing environments

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Application and comparison of hybrid evolutionary multiobjective optimization algorithms for solving task scheduling problem on heterogeneous systems

Applied Soft Computing
An efficient weighted bi-objective scheduling algorithm for heterogeneous systems

Parallel Computing
Tradeoff exploration between reliability, power consumption, and execution time

SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
Optimizing performance and reliability on heterogeneous parallel systems: Approximation algorithms and heuristics

Journal of Parallel and Distributed Computing
Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms

Journal of Scheduling
Reliability and performance optimization of pipelined real-time systems

Journal of Parallel and Distributed Computing
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
Distributed Throughput Optimization for Large-Scale Scientific Workflows Under Fault-Tolerance Constraint

Journal of Grid Computing
Load balanced reliable task scheduling algorithm for heterogeneous systems

Journal of High Speed Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applications implemented on critical systems are subject to both safety critical and real-time constraints. Classically, applications are specified as precedence task graphs that must be scheduled onto a given target multiprocessor heterogeneous architecture. We propose a new method for simultaneously optimizing two objectives: the execution time and the reliability of the schedule. The problem is decomposed into two successive steps: a spatial allocation during which the reliability is maximized (randomized algorithm), and a scheduling during which the makespan is minimized (list scheduling algorithm). It allows us to produce several trade-off solutions, among which the user can choose the solution that best fits the application's requirements. Reliability is increased by replicating adequate tasks onto well chosen processors. Our fault model assumes that processors are fail-silent, that they are subject to transient failures, and that the occurrences of failures follow a constant parameter Poisson law. We assess and validate our method by running extensive simulations on both random graphs and actual application graphs. They show that it is competitive, in terms of makespan, compared to existing reference scheduling methods for heterogeneous processors (HEFT), while providing a better reliability.