Fault-tolerant computing: theory and techniques; vol. 1
Fault-tolerant computing: theory and techniques; vol. 1
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Voltage scheduling problem for dynamically variable voltage processors
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
JouleTrack: a web based tool for software energy profiling
Proceedings of the 38th annual Design Automation Conference
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Towards energy-aware software-based fault tolerance in real-time systems
Proceedings of the 2002 international symposium on Low power electronics and design
A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods
IEEE Transactions on Computers
The case for power management in web servers
Power aware computing
A scheduling model for reduced CPU energy
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Energy efficient CMOS microprocessor design
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Worst Case Timing Requirement of Real-Time Tasks with Time Redundancy
RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
Energy-Efficient Duplex and TMR Real-Time Systems
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Power-aware QoS Management in Web Servers
RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
FAST: Frequency-Aware Static Timing Analysis
RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
The Interplay of Power Management and Fault Recovery in Real-Time Systems
IEEE Transactions on Computers
Task Feasibility Analysis and Dynamic Voltage Scaling in Fault-Tolerant Real-Time Embedded Systems
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Analysis of an Energy Efficient Optimistic TMR Scheme
ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
Energy-Aware Adaptive Checkpointing in Embedded Real-Time Systems
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Scheduling for reduced CPU energy
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Derivation and Calibration of a Transient Error Reliability Model
IEEE Transactions on Computers
Energy-efficient server clusters
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Utility-based scheduling for grid computing under constraints of energy budget and deadline
Computer Standards & Interfaces
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy constrained resource allocation optimization for mobile grids
Journal of Parallel and Distributed Computing
Energy-aware grid resource scheduling: model and algorithm
International Journal of Computer Applications in Technology
Reliability-aware dynamic energy management in dependable embedded real-time systems
ACM Transactions on Embedded Computing Systems (TECS)
An economics-based negotiation scheme among mobile devices in mobile grid
Computer Standards & Interfaces
Two-level market solution for services composition optimization in mobile grid
Journal of Network and Computer Applications
Collaboration among mobile agents for efficient energy allocation in mobile grid
Information Systems Frontiers
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Redundancy is the traditional technique used to increase system reliability. With modern technology, in addition to being used as temporal redundancy, slack time can also be used by energy management schemes to scale down system processing speed and supply voltage to save energy. In this paper, we consider a system that consists of multiple servers for providing reliable service. Assuming that servers have self-detection mechanisms to detect faults, we first propose an efficient parallel recovery scheme that processes service requests in parallel to increase the number of faults that can be tolerated and thus the system reliability. Then, for a given request arrival rate, we explore the optimal number of active severs needed for minimizing system energy consumption while achieving k-fault tolerance or for maximizing the number of faults to be tolerated with limited energy budget. Analytical results are presented to show the trade-off between the energy savings and the number of faults being tolerated.