Measurement and modeling of computer reliability as affected by system activity
ACM Transactions on Computer Systems (TOCS)
Fault-tolerant computing: theory and techniques; vol. 1
Fault-tolerant computing: theory and techniques; vol. 1
Energy efficient CMOS microprocessor design
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
The effects of energy management on reliability in real-time embedded systems
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Reliability-Aware Dynamic Energy Management in Dependable Embedded Real-Time Systems
RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
System-Level Energy Management for Periodic Real-Time Tasks
RTSS '06 Proceedings of the 27th IEEE International Real-Time Systems Symposium
Energy management for real-time embedded systems with reliability requirements
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Derivation and Calibration of a Transient Error Reliability Model
IEEE Transactions on Computers
Many-core design from a thermal perspective
Proceedings of the 45th annual Design Automation Conference
Impact of Technology and Voltage Scaling on the Soft Error Susceptibility in Nanoscale CMOS
DFT '08 Proceedings of the 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems
RTSS '08 Proceedings of the 2008 Real-Time Systems Symposium
Reliability-Aware Energy Management for Periodic Real-Time Tasks
IEEE Transactions on Computers
Enhanced reliability-aware power management through shared recovery technique
Proceedings of the 2009 International Conference on Computer-Aided Design
Soft error-aware design optimization of low power and time-constrained embedded systems
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the Conference on Design, Automation and Test in Europe
Minimizing Stretch and Makespan of Multiple Parallel Task Graphs via Malleable Allocations
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Generalized reliability-oriented energy management for real-time embedded applications
Proceedings of the 48th Design Automation Conference
Hi-index | 0.00 |
In this paper, we study the problem of how to execute a real-time frame-based task set on DVFS-enabled processors so that the system's reliability can be guaranteed and the energy consumption of executing the task set is minimized. To ensure the reliability requirement, processing resources are reserved to re-execute tasks when transient faults occur. However, different from commonly used approaches that the reserved processing resources are shared by all tasks in the task set, we judiciously select a subset of tasks to share these reserved resources for recovery purposes. In addition, we formally prove that for a give task set, the system achieves the highest reliability and consumes the least amount of energy when the task set is executed with a uniform frequency (or neighboring frequencies if the desired frequency is not available). Based on the theoretic conclusion, rather than heuristically searching for appropriate execution frequency for each individual task as used in many research work, we directly calculate the optimal execution frequency for the task set. Our simulation results have shown that with our approach, we can not only guarantee the same level of system reliability, but also have up to 15% energy saving improvement over other fault recovery-based approaches existed in the literature. Furthermore, as our approach does not require frequent frequency changes, it works particularly effective on processors where frequency switching overhead is large and not negligible.