Examining ACE analysis reliability estimates using fault-injection
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Automatic software fault localization using generic program invariants
Proceedings of the 2008 ACM symposium on Applied computing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Checkpoint allocation and release
ACM Transactions on Architecture and Code Optimization (TACO)
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Synchronizing redundant cores in a dynamic DMR multicore architecture
IEEE Transactions on Circuits and Systems II: Express Briefs
mSWAT: low-cost hardware fault detection and diagnosis for multicore systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Encore: low-cost, fine-grained transient fault recovery
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Quantitative evaluation of soft error injection techniques for robust system design
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.01 |
Fault screeners are a new breed of fault identification technique that can probabilistically detect if a transient fault has affected the state of a processor. We demonstrate that fault screeners function because of two key characteristics. First, we show that much of the intermediate data generated by a program inherently falls within certain consistent bounds. Second, we observe that these bounds are often violated by the introduction of a fault. Thus, fault screeners can identify faults by directly watching for any data inconsistencies arising in an application's behavior. We present an idealized algorithm capable of identifying over 85% of injected faults on the SpecInt suite and over 75% overall. Further, in a realistic implementation on a simulated Pentium-III-like processor, about half of the errors due to injected faults are identified while still in speculative state. Errors detected this early can be eliminated by a pipeline flush. In this paper, we present several hardware-based versions of this screening algorithm and show that flushing the pipeline every time the hardware screener triggers reduces overall performance by less than 1%