CASPAR: hardware patching for multi-core processors

  • Authors:
  • Ilya Wagner;Valeria Bertacco

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI

  • Venue:
  • Proceedings of the Conference on Design, Automation and Test in Europe
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensuring correctness of execution of complex multi-core processor systems deployed in the field remains to this day an extremely challenging task. The major part of this effort is concentrated on design verification, where different pre- and post-silicon techniques are used to guarantee that devices behave exactly as stated in the specification. Unfortunately, the performance of even state-of-the-art validation tools lags behind the growing complexity of multi-core designs. Therefore, subtle bugs still slip into released components, causing incorrect computational results, or even compromising the security of the end-user systems. In this work we present Caspar - an approach for in-the-field patching of the memory subsystem hardware in multi-core chips. Caspar relies on a checkpointing system, which periodically logs the state of the chip, and a novel error detection and recovery scheme, which uses a simplified mode of operation to bypass cache coherence and consistency errors. The implementation of Caspar employs hardware detectors: on-die programmable circuits to identify system's configurations that may lead to bugs, and to trigger recovery and bypass. Our experimental results show that Caspar can be used effectively to detect and bypass a variety of memory subsystem bugs, with as little as 2% performance impact and 6% area overhead during bug-free operation.