Examining ACE analysis reliability estimates using fault-injection

Authors:
Nicholas J. Wang;Aqeel Mahesri;Sanjay J. Patel
Affiliations:
University of Illinois, Urbana-Champaign, IL;University of Illinois, Urbana-Champaign, IL;University of Illinois, Urbana-Champaign, IL
Venue:
Proceedings of the 34th annual international symposium on Computer architecture
Year:
2007

Citing 13
Cited 7

Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
A Fault Tolerant Approach to Microprocessor Design

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Soft Error Sensitivity Characterization for Microprocessor Dependability Enhancement Strategy

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Y-Branches: When You Come to a Fork in the Road, Take It

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Cache Scrubbing in Microprocessors: Myth or Necessity?

PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor

Proceedings of the 31st annual international symposium on Computer architecture
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Design and Evaluation of Hybrid Fault-Detection Systems

Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection

Proceedings of the 32nd annual international symposium on Computer Architecture
Computing Architectural Vulnerability Factors for Address-Based Structures

Proceedings of the 32nd annual international symposium on Computer Architecture
Perturbation-based Fault Screening

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture

Soft error vulnerability of iterative linear algebra methods

Proceedings of the 22nd annual international conference on Supercomputing
Online Estimation of Architectural Vulnerability Factor for Soft Errors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Architecture Design for Soft Errors

Architecture Design for Soft Errors
Impact analysis of performance faults in modern microprocessors

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Capturing vulnerability variations for register files

Proceedings of the Conference on Design, Automation and Test in Europe
Quantitative evaluation of soft error injection techniques for robust system design

Proceedings of the 50th Annual Design Automation Conference
On the Impact of Performance Faults in Modern Microprocessors

Journal of Electronic Testing: Theory and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from abstract performance models with low level design details to identify and rule out transient faults that will not cause incorrect execution. While many transient faults are analyzable in ACE analysis frameworks, some are not. As a result, ACE analysis is conservative and provides a lower bound for the reliability of a processor design. Bounding the reliability of a design is useful since it can guarantee that the given design will meet reliability goals. In this work, we quantify and identify the sources of ACE analysis conservatism by comparing an ACE analysis methodology against a rigorous fault-injection study. We evaluate two flavors of ACE analysis: a "simple" analysis and a refined analysis, finding that even the refined analysis overestimates the soft error vulnerability of an instruction scheduler by 2-3x. The conservatism stems from two key sources: from lack of detail in abstract performance models and from what we term Y-Bits, a result of the single-pass simulation methodology that is typical of ACE analysis. We also examine the efficacy of applying ACE analysis to a class of "partial coverage" error mitigation techniques. In particular, we perform a case study on one such technique and extrapolate our findings to others.