Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Task Allocation for Maximizing Reliability of Distributed Computer Systems
IEEE Transactions on Computers
Approximation algorithms for scheduling
Approximation algorithms for NP-hard problems
Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems
IEEE Transactions on Computers
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
Safety and Reliability Driven Task Allocation in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors
Journal of the ACM (JACM)
Guest Editorial: A Review of Worst-Case Execution-TimeAnalysis
Real-Time Systems - Special issue on worst-case execution-time analysis
Fast and Precise WCET Prediction by Separated Cache andPath Analyses
Real-Time Systems - Special issue on worst-case execution-time analysis
Worst Case Execution Time Analysis for a Processor withBranch Prediction
Real-Time Systems - Special issue on worst-case execution-time analysis
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Reliable and Precise WCET Determination for a Real-Life Processor
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
On the approximability of trade-offs and optimal access of Web sources
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Benchmarking the Task Graph Scheduling Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Scheduling Problems with Two Competing Agents
Operations Research
Approximation results for a bicriteria job scheduling problem on a single machine without preemption
Information Processing Letters
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A Novel Bicriteria Scheduling Heuristics Providing a Guaranteed Global System Failure Rate
IEEE Transactions on Dependable and Secure Computing
Characterization of Pareto dominance
Operations Research Letters
An efficient weighted bi-objective scheduling algorithm for heterogeneous systems
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Reliable parallel programming model for distributed computing environments
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Tradeoff exploration between reliability, power consumption, and execution time
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
Journal of Parallel and Distributed Computing
Reliability and performance optimization of pipelined real-time systems
Journal of Parallel and Distributed Computing
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Load balanced reliable task scheduling algorithm for heterogeneous systems
Journal of High Speed Networks
Hi-index | 0.00 |
Applications implemented on critical systems are subject to both safety critical and real-time constraints. Classically, applications are specified as precedence task graphs that must be scheduled onto a given target multiprocessor heterogeneous architecture. We propose a new method for simultaneously optimizing two objectives: the execution time and the reliability of the schedule. The problem is decomposed into two successive steps: a spatial allocation during which the reliability is maximized (randomized algorithm), and a scheduling during which the makespan is minimized (list scheduling algorithm). It allows us to produce several trade-off solutions, among which the user can choose the solution that best fits the application's requirements. Reliability is increased by replicating adequate tasks onto well chosen processors. Our fault model assumes that processors are fail-silent, that they are subject to transient failures, and that the occurrences of failures follow a constant parameter Poisson law. We assess and validate our method by running extensive simulations on both random graphs and actual application graphs. They show that it is competitive, in terms of makespan, compared to existing reference scheduling methods for heterogeneous processors (HEFT), while providing a better reliability.