Improving yield and reliability of chip multiprocessors

  • Authors:
  • Abhisek Pan;Omer Khan;Sandip Kundu

  • Affiliations:
  • University Of Massachusetts, Amherst, MA;University Of Massachusetts, Amherst, MA;University Of Massachusetts, Amherst, MA

  • Venue:
  • Proceedings of the Conference on Design, Automation and Test in Europe
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

An increasing number of hardware failures can be attributed to device reliability problems that cause partial system failure or shutdown. In this paper we propose a scheme for improving reliability of a homogeneous chip multiprocessor (CMP) that also serves to improve manufacturing yield. Our solution centers on exploiting the natural redundancy that already exists in multi-core systems by using services from other cores for functional units that are defective in a faulty core. A micro-architectural modification allows a core on a CMP to use another core as a coprocessor to service any instruction that the former cannot execute correctly. This service is accessed to improve yield and reliability, but at the cost of some loss of performance. In order to quantify this loss we have used a cycle-accurate simulator to simulate the performance of a dual-core system with one or two cores sustaining partial failure. Our results indicate that when a large and sparingly-used unit such as a floating point arithmetic unit fails in a core, even for a floating point intensive benchmark, we can continue to run each faulty core with help from companion cores with as little as 10% impact to performance and less than 1% area overhead.