Processor-Level Selective Replication

Authors:
Nithin Nakka;Karthik Pattabiraman;Ravishankar Iyer
Affiliations:
Center for Reliable and High Performance Computing;Center for Reliable and High Performance Computing;Center for Reliable and High Performance Computing
Venue:
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Year:
2007

Citing 0
Cited 4

ESoftCheck: Removal of Non-vital Checks for Fault Tolerance

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Encore: low-cost, fine-grained transient fault recovery

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Epipe: A low-cost fault-tolerance technique considering WCET constraints

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a processor-level technique called Selective Replication, by which the application can choose where in its application stream and to what degree it requires replication. Recent work on static analysis and fault-injection-based experiments on applications reveals that certain variables in the application are critical to its crash- and hang-free execution. If it can be ensured that only the computation of these variables is error-free, then a high degree of crash/hang coverage can be achieved at a low performance overhead to the application. The Selective Replication technique provides an ideal platform for validating this claim. The technique is compared against complete duplication as provided in current architecture-level techniques. The results show that with about 59% less overhead than full duplication, selective replication detects 97% of the data errors and 87% of the instruction errors that were covered by full duplication. It also reduces the detection of errors benign to the final outcome of the application by 17.8% as compared to full duplication.