Reliability modeling and management in dynamic microprocessor-based systems

  • Authors:
  • Eric Karl;David Blaauw;Dennis Sylvester;Trevor Mudge

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI

  • Venue:
  • Proceedings of the 43rd annual Design Automation Conference
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reliability failure mechanisms, such as time dependent dielectric breakdown, electromigration, and thermal cycling have become a key concern in processor design. The traditional approach to reliability qualification assumes that the processor will operate at maximum performance continuously under worst case voltage and temperature conditions. However, the typical processor spends a very small fraction of its operational time at maximum voltage and temperature. In this paper, we show how this results in a reliability "slack" that can be leveraged to provide increased performance during periods of peak processor demand. We develop a novel, real time reliability model based on workload driven conditions. We then propose a new dynamic reliability management (DRM) scheme that results in 20-35% performance improvement during periods of peak computational demand while ensuring the required reliability lifetime.