Managing multi-core soft-error reliability through utility-driven cross domain optimization

  • Authors:
  • Wangyuan Zhang; Tao Li

  • Affiliations:
  • Intelligent Design of Efficient Architecture Lab (IDEAL), Department of Electrical and Computer Engineering, University of Florida, USA;Intelligent Design of Efficient Architecture Lab (IDEAL), Department of Electrical and Computer Engineering, University of Florida, USA

  • Venue:
  • ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As semiconductor processing technology continues to scale down, managing reliability becomes an increasingly difficult challenge in high-performance microprocessor design. Transient faults, also known as soft errors, corrupt program data at the circuit level and cause incorrect program execution and system crashes. Future processors will consist of billions of transistors organized as multi-core microarchitectures. Packaging multiple cores (and hence more transistors) onto the same die exposes more devices to soft error strikes. This paper explores utility-function-driven (benefit driven) cross domain optimization for both performance and reliability. We propose the use of utility-based resource management for individual cores while applying utility-based shared cache partitioning across multiple cores. Moreover, we coordinate the optimization of multiple resources based on their cross domain utility information to achieve attractive performance and reliability tradeoffs. Extensive experimental results show that, on average, our utility-driven cross domain optimization reduces the soft error rate of the most vulnerable core in a Chip Multiprocessor (CMP) by up to 35% and improves the CMP’s overall reliability by 22% with less than 3% performance degradation across 15 investigated workloads.