Energy/reliability trade-offs in fault-tolerant event-triggered distributed embedded systems
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Towards fault-tolerant embedded systems with imperfect fault detection
Proceedings of the 49th Annual Design Automation Conference
Energy- and reliability-aware task scheduling onto heterogeneous MPSoC architectures
The Journal of Supercomputing
ACM Transactions on Design Automation of Electronic Systems (TODAES)
An elastic mixed-criticality task model and its scheduling algorithm
Proceedings of the Conference on Design, Automation and Test in Europe
Journal of Systems and Software
Optimization power consumption model of reliability-aware GPU clusters
The Journal of Supercomputing
Communication and migration energy aware task mapping for reliable multiprocessor systems
Future Generation Computer Systems
Hi-index | 14.99 |
Dynamic Voltage and Frequency Scaling (DVFS) has been widely used to manage energy in real-time embedded systems. However, it was recently shown that DVFS has direct and adverse effects on system reliability. In this work, we investigate static and dynamic reliability-aware energy management schemes to minimize energy consumption for periodic real-time systems while preserving system reliability. Focusing on earliest deadline first (EDF) scheduling, we first show that the static version of the problem is NP-hard and propose two task-level utilization-based heuristics. Then, we develop a job-level online scheme by building on the idea of wrapper-tasks, to monitor and manage dynamic slack efficiently in reliability-aware settings. The feasibility of the dynamic scheme is formally proved. Finally, we present two integrated approaches to reclaim both static and dynamic slack at runtime. To preserve system reliability, the proposed schemes incorporate recovery tasks/jobs into the schedule as needed, while still using the remaining slack for energy savings. The proposed schemes are evaluated through extensive simulations. The results confirm that all the proposed schemes can preserve the system reliability, while the ordinary (but reliability-ignorant) energy management schemes result in drastically decreased system reliability. For the static heuristics, the energy savings are close to what can be achieved by an optimal solution by a margin of 5 percent. By effectively exploiting the runtime slack, the dynamic schemes can achieve additional energy savings while preserving system reliability.