The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Fault Injection Techniques and Tools
Computer
IBM's S/390 G5 Microprocessor Design
IEEE Micro
A User-level Checkpointing Library for POSIX Threads Programs
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
An Architectural Framework for Providing Reliability and Security Support
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Application-level checkpointing for shared memory programs
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
Rapid Embedded Hardware/Software System Generation
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
Design at the end of the silicon roadmap
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
IMPRES: integrated monitoring for processor reliability and security
Proceedings of the 43rd annual Design Automation Conference
Cost-efficient soft error protection for embedded microprocessors
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
An OS-level Framework for Providing Application-Aware Reliability
PRDC '06 Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing
ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
Challenges and Solutions for Late- and Post-Silicon Design
IEEE Design & Test
Processor Description Languages
Processor Description Languages
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CSER: HW/SW configurable soft-error resiliency for application specific instruction-set processors
Proceedings of the Conference on Design, Automation and Test in Europe
RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors
Proceedings of the 50th Annual Design Automation Conference
DHASER: dynamic heterogeneous adaptation for soft-error resiliency in ASIP-based multi-core systems
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Checkpoint and Recovery (CR) allows computer systems to operate correctly even when compromised by transient faults. While many software systems and hardware systems for CR do exist, they are usually either too large, require major modifications to the software, too slow, or require extensive modifications to the caching schemes. In this paper, we propose a novel error-recovery management scheme, which is based upon re-engineering the instruction set. We take the native instruction set of the processor and enhance the microinstructions with additional micro-operations which enable checkpointing. The recovery mechanism is implemented by three custom instructions, which recover the registers which were changed, the data memory values which were changed and the special registers (PC, status registers etc.) which were changed. Our checkpointing storage is changed according to the benchmark executed. Results show that our method degrades performance by just 1.45% under fault free conditions, and incurs area overhead of 45% on average and 79% in the worst case. The recovery takes just 62 clock cycles (worst case) in the examples which we examined.