Multi-mechanism reliability modeling and management in dynamic systems

  • Authors:
  • Eric Karl;David Blaauw;Dennis Sylvester;Trevor Mudge

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI;Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI;Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI;Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reliability failure mechanisms, such as time-dependent dielectric breakdown (TDDB), electromigration, and negative bias temperature instability (NBTI), have become a key concern in integrated circuit (IC) design. The traditional approach to reliability qualification assumes that the system will operate at maximum performance continuously under worst case voltage and temperature conditions. In reality, due to widely varying environmental conditions and an increased use of dynamic control techniques, such as dynamic voltage scaling and sleep modes, the typical system spends a very small fraction of its operational time at maximum voltage and temperature. In this paper, we show how this results in a reliability "slack" that can be leveraged to provide increased performance during periods of peak processing demand. We develop a novel, real time reliability model based on workload driven conditions. Based on this model, we then propose a new dynamic reliability management (DRM) scheme that results in 20%-35% performance improvement during periods of peak computational demand while ensuring the required reliability lifetime.