COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
IBM's S/390 G5 Microprocessor Design
IEEE Micro
Prototyping Architectural Support for Program Rollback Using FPGAs
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Ultra low-cost defect protection for microprocessor pipelines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
International Journal of High Performance Computing Applications
Securing the data path of next-generation router systems
Computer Communications
Reliability analysis for MPSoCs with mixed-critical, hard real-time constraints
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Specification and synthesis of hardware checkpointing and rollback mechanisms
Proceedings of the 49th Annual Design Automation Conference
Proceedings of the 26th ACM international conference on Supercomputing
CSER: HW/SW configurable soft-error resiliency for application specific instruction-set processors
Proceedings of the Conference on Design, Automation and Test in Europe
RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors
Proceedings of the 50th Annual Design Automation Conference
Reli: hardware/software checkpoint and recovery scheme for embedded processors
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
The Journal of Supercomputing
DHASER: dynamic heterogeneous adaptation for soft-error resiliency in ASIP-based multi-core systems
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.01 |
Existing cache-level checkpointing schemes do not continuously support a large rollback window. Immediately after a checkpoint, the number of instructions that the processor can undo falls to zero. To address this problem, we introduce SWICH, an FPGA-based prototype of a new cache-level scheme that keeps two live checkpoints at all times, forming a sliding rollback window that maintains a large minimum and average length.