Task Allocation for Maximizing Reliability of Distributed Computer Systems
IEEE Transactions on Computers
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems
IEEE Transactions on Computers
Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload
IEEE Transactions on Parallel and Distributed Systems
On Optimal Strategies for Cycle-Stealing in Networks of Workstations
IEEE Transactions on Computers
Scheduling Algorithms
The effects of energy management on reliability in real-time embedded systems
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Reliability versus performance for critical applications
Journal of Parallel and Distributed Computing
International Journal of High Performance Computing Applications
A Novel Bicriteria Scheduling Heuristics Providing a Guaranteed Global System Failure Rate
IEEE Transactions on Dependable and Secure Computing
Journal of Parallel and Distributed Computing
Towards fault-tolerant embedded systems with imperfect fault detection
Proceedings of the 49th Annual Design Automation Conference
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reliable workflow scheduling with less resource redundancy
Parallel Computing
Hi-index | 0.00 |
This paper deals with the reliability of task graph schedules with transient and fail-stop failures. While computing the reliability of a given schedule is easy in the absence of task replication, the problem becomes much more difficult when task replication is used. We fill a complexity gap of the scheduling literature: our main result is that this reliability problem is #P驴-Complete (hence at least as hard as NP-Complete problems), both for transient and for fail-stop processor failures. We also study the evaluation of a restricted class of schedules, where a task cannot be scheduled before all replicas of all its predecessors have completed their execution. Although the complexity in this case with fail-stop failures remains open, we provide an algorithm to estimate the reliability while limiting evaluation costs, and we validate this approach through simulations.