Experimental analysis of computer system dependability
Fault-tolerant computer system design
DEPEND: A Simulation-Based Environment for System Level Dependability Analysis
IEEE Transactions on Computers
FOCUS: An Experimental Environment for Fault Sensitivity Analysis
IEEE Transactions on Computers
FAMAS: FAult Modeling via Adaptive Simulation
VLSID '97 Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications
IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
Hierarchical simulation to assess hardware and software dependability
Hierarchical simulation to assess hardware and software dependability
Hierarchical application aware error detection and recovery
Proceedings of the 41st annual Design Automation Conference
Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes
IEEE Transactions on Dependable and Secure Computing
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Quantitative evaluation of soft error injection techniques for robust system design
Proceedings of the 50th Annual Design Automation Conference
An infrastructure for accurate characterization of single-event transients in digital circuits
Microprocessors & Microsystems
Hi-index | 0.00 |
This paper presents a hierarchical simulation methodology that enables accurate system evaluation under realistic faults and conditions. In this methodology, effects of low-level (i.e., transistor or circuit level) faults are propagated to higher levels (i.e., system level) using fault dictionaries. The primary fault models are obtained via simulation of the transistor-level effect of a radiation particle penetrating a device. The resulting current bursts constitute the first-level fault dictionary and are used in the circuit-level simulation to determine the impact on circuit latches and flip-flops. The latched outputs constitute the next level fault dictionary in the hierarchy and are applied in conducting fault injection simulation at the chip-level under selected workloads or application programs. Faults injected at the chip-level result in memory corruptions, which are used to form the next level fault dictionary for the system-level simulation of an application running on simulated hardware. When an application terminates, either normally or abnormally, the overall fault impact on the software behavior is quantified and analyzed. The system in this sense can be a single workstation or a network. The simulation method is demonstrated and validated in the case study of Myrinet (a commercial, high-speed network) based network system. The study shows that the method: 1) allows detailed simulation of faults at lower levels and effective fault propagation through the system to the user-visible higher levels using fault dictionaries, 2) links physical faults with effects that the user can observe at the higher levels and thus provides a foundation for realistic fault injection studies, 3) allows significant reduction in the number of simulations needed, due to the fault dictionary method, 4) offers a high confidence in the evaluation results because the system is analyzed in presence of realistic fault conditions, and 5) provides valuable feedback for designing error recovery mechanisms to improve dependability.