Microarchitecture soft error vulnerability characterization and mitigation under 3D integration technology

  • Authors:
  • Wangyuan Zhang; Tao Li

  • Affiliations:
  • Intelligent Design of Efficient Architecture Lab (IDEAL), Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA, 32611;Intelligent Design of Efficient Architecture Lab (IDEAL), Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA, 32611

  • Venue:
  • Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As semiconductor processing techniques continue to scale down, transient faults, also known as soft errors, are increasingly becoming a reliability threat to high-performance microprocessors fabricated using state-of-the-art CMOS technologies. Emerging 3D chip integration techniques leverage vertically stacked structures to reduce on-chip wire delay and have shown the capability of overcoming interconnect bottlenecks as well as reducing power consumption. While the benefits of 3D die stacking on microprocessor performance and power have been extensively investigated recently, its implication on transient fault susceptibility is largely unknown. In this work, we make the first attempt to characterize microarchitecture soft error vulnerabilities across the stacked chip layers under 3D integration technologies. Using models and simulations that capture soft error physical mechanism and circuit/architecture level impact, our study reveals the opportunities of leveraging 3D integration (e.g. the structure of vertical stacking and the incorporation of heterogeneous process technologies) to achieve enhanced reliability. We showcase that the first characteristic allows outer-layers to shield inter-layers from particle strikes and the second feature enables the deployment of error resilience device techniques (e.g. Silicon-On-Insulator) on vulnerable layers to achieve a reliability target while minimizing manufacturing cost. We further propose a set of microarchitecture techniques which can effectively exploit the reliability benefits offered by 3D technologies. For example, we propose the scheduling of vulnerable in-flight instructions to reliable layers and design robust register files by combing reliability-hardened circuits, program value vulnerability and 3D integration techniques. Experimental results show that these techniques are able to substantially reduce 3D microarchitectures’ soft error rate by up to 88% compared to a planar design. We further evaluate the thermal implication of the proposed techniques and conclude that their impact on chip temperature is negligible.