Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems

  • Authors:
  • Phillip Stanley-Marbell;Diana Marculescu

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Emerging VLSI technologies and platforms are giving rise tosystems with inherently high potential for runtime failure.Such failures range from intermittent electrical and mechanicalfailures at the system level, to device failures at the chip level.Techniques to provide reliable computation in the presence offailures must do so while maintaining high performance, withan eye toward energy efficiency. When possible, they shouldmaximize battery lifetime in the face of battery discharge non-linearities. This paper introduces the concept of adaptive fault-tolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability.In order to judge the efficacy of the proposed algorithmsfor dynamic fault-tolerance management, a set of metrics, forcharacterizing system behavior in terms of energy efficiency,reliability, computation performance and battery lifetime, ispresented. For an example platform employed in a realistic evaluation scenario, it is shown that system configurations with the best performance and lifetime are not necessarilythose with the best combination of performance, reliability,battery lifetime and average power consumption.