DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Guarded commands, nondeterminacy and formal derivation of programs
Communications of the ACM
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Concurrent Error Detection Using Watchdog Processors-A Survey
IEEE Transactions on Computers
Reliable Floating-Point Arithmetic Algorithms for Error-Coded Operands
IEEE Transactions on Computers
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Ultra low-cost defect protection for microprocessor pipelines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
IFRA: instruction footprint recording and analysis for post-silicon bug localization in processors
Proceedings of the 45th annual Design Automation Conference
Hi-index | 0.00 |
The increasing pressure to make hardware resilient to runtime failures has prompted development of design techniques for specific classes of systems, e.g. processors and routers. However, these techniques come at increased design and verification costs, thus limiting their broader application. In this work we describe a methodology for general RTL designs based on the widely usable checkpointing and rollback resiliency mechanism. We take a modeling and language approach that provides an appropriate set of abstractions for the resiliency logic. This cleanly separates the main design behavior from the resiliency behavior, leading to ease of design. Further, as the language abstractions can be automatically synthesized into resiliency logic, our methodology can merge with existing design flows. The concerns of verifying this additional resiliency logic can be addressed by synthesizing behavioral assertions capturing correct behavior. We demonstrate the use of this methodology on four examples, with synthesis for performance and area to estimate the overhead of the additional synthesis logic.