The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Ensuring correctness of execution of complex multi-core processor systems deployed in the field remains to this day an extremely challenging task. The major part of this effort is concentrated on design verification, where different pre- and post-silicon techniques are used to guarantee that devices behave exactly as stated in the specification. Unfortunately, the performance of even state-of-the-art validation tools lags behind the growing complexity of multi-core designs. Therefore, subtle bugs still slip into released components, causing incorrect computational results, or even compromising the security of the end-user systems. In this work we present Caspar - an approach for in-the-field patching of the memory subsystem hardware in multi-core chips. Caspar relies on a checkpointing system, which periodically logs the state of the chip, and a novel error detection and recovery scheme, which uses a simplified mode of operation to bypass cache coherence and consistency errors. The implementation of Caspar employs hardware detectors: on-die programmable circuits to identify system's configurations that may lead to bugs, and to trigger recovery and bypass. Our experimental results show that Caspar can be used effectively to detect and bypass a variety of memory subsystem bugs, with as little as 2% performance impact and 6% area overhead during bug-free operation.