AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors

Authors:
Arun Arvind Nair;Lizy Kurian John;Lieven Eeckhout
Affiliations:
-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 21
Cited 2

Optimization of control parameters for genetic algorithms

IEEE Transactions on Systems, Man and Cybernetics
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamic dead-instruction detection and elimination

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Instruction Window Size Trade-Offs and Characterization of Program Parallelism

IEEE Transactions on Computers
A Framework for Statistical Modeling of Superscalar Processor Performance

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Reducing the Soft-Error Rate of a High-Performance Microprocessor

IEEE Micro
Opportunistic Transient-Fault Detection

Proceedings of the 32nd annual international symposium on Computer Architecture
Computing Architectural Vulnerability Factors for Address-Based Structures

Proceedings of the 32nd annual international symposium on Computer Architecture
SoftArch: An Architecture Level Tool for Modeling and Analyzing Soft Errors

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

IEEE Micro
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Dynamic prediction of architectural vulnerability from microarchitectural state

Proceedings of the 34th annual international symposium on Computer architecture
Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
On Accelerating Soft-Error Detection by Targeted Pattern Generation

ISQED '07 Proceedings of the 8th International Symposium on Quality Electronic Design
Accelerating Soft Error Rate Testing Through Pattern Selection

IOLTS '07 Proceedings of the 13th IEEE International On-Line Testing Symposium
Soft-error resilience of the IBM POWER6 processor

IBM Journal of Research and Development
An improved soft-error rate measurement technique

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Using hardware vulnerability factors to enhance AVF analysis

Proceedings of the 37th annual international symposium on Computer architecture
System-level max power (SYMPO): a systematic approach for escalating system-level power consumption using synthetic benchmarks

Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Thread vulnerability in parallel applications

Journal of Parallel and Distributed Computing
Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Soft error reliability is increasingly becoming a first-order design concern for microprocessors, as a result of higher transistor counts, shrinking device geometries and lowering of operating voltages. It is important for designers to be able to validate whether the Soft Error Rate (SER) targets of their design have been met, and help end users select the processor best suited to their reliability goals. The knowledge of the observable worst-case SER allows designers to select their design point, and bound the worst-case vulnerability at that design point. We highlight the lack of a methodology for evaluation of the overall observable worst-case SER. Hence, there is a clear need for a so called stress mark that can demonstrably approach the observable worst-case SER. The worst-case thus obtained can be used to identify reliability bottlenecks, validate safety margins used for reliability design and identify inadequacies in benchmark suites used to evaluate SER. Starting from a comprehensive study about how micro architecture-dependent program characteristics affect soft errors, we derive the insights needed to develop an automated and flexible methodology for generating a stress mark that approaches the maximum SER of an out-of-order processor. We demonstrate how our methodology enables architects to quantify the impact of SER-mitigation mechanisms on the worst-case SER of the processor. The stress mark achieves 1.4X higher SER in the core, 2.5X higher SER in DL1 and DTLB, and 1.5X higher SER in L2 as compared to the highest SER induced by SPEC CPU2006 and MiBench programs.