Specification and synthesis of hardware checkpointing and rollback mechanisms

Authors:
Carven Chan;Daniel Schwartz-Narbonne;Divjyot Sethi;Sharad Malik
Affiliations:
Princeton University, New Jersey;Princeton University, New Jersey;Princeton University, New Jersey;Princeton University, New Jersey
Venue:
Proceedings of the 49th Annual Design Automation Conference
Year:
2012

Citing 15
Cited 0

Fault-Tolerant Computing: Fundamental Concepts

Computer
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Guarded commands, nondeterminacy and formal derivation of programs

Communications of the ACM
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Processor and Memory-Based Checkpoint and Rollback Recovery

Computer
Concurrent Error Detection Using Watchdog Processors-A Survey

IEEE Transactions on Computers
Reliable Floating-Point Arithmetic Algorithms for Error-Coded Operands

IEEE Transactions on Computers
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Ultra low-cost defect protection for microprocessor pipelines

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback

IEEE Micro
Libckpt: transparent checkpointing under Unix

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
IFRA: instruction footprint recording and analysis for post-silicon bug localization in processors

Proceedings of the 45th annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing pressure to make hardware resilient to runtime failures has prompted development of design techniques for specific classes of systems, e.g. processors and routers. However, these techniques come at increased design and verification costs, thus limiting their broader application. In this work we describe a methodology for general RTL designs based on the widely usable checkpointing and rollback resiliency mechanism. We take a modeling and language approach that provides an appropriate set of abstractions for the resiliency logic. This cleanly separates the main design behavior from the resiliency behavior, leading to ease of design. Further, as the language abstractions can be automatically synthesized into resiliency logic, our methodology can merge with existing design flows. The concerns of verifying this additional resiliency logic can be addressed by synthesizing behavioral assertions capturing correct behavior. We demonstrate the use of this methodology on four examples, with synthesis for performance and area to estimate the overhead of the additional synthesis logic.