The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Fault Injection Techniques and Tools
Computer
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Fault Injection Techniques and Tools for Embedded Systems
Fault Injection Techniques and Tools for Embedded Systems
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Linux Journal
Design and Evaluation of Hybrid Fault-Detection Systems
Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Compiler-guided register reliability improvement against soft errors
Proceedings of the 5th ACM international conference on Embedded software
Computing Cache Vulnerability to Transient Errors and Its Implication
DFT '05 Proceedings of the 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Software-Implemented Hardware Fault Tolerance
Software-Implemented Hardware Fault Tolerance
Dynamic prediction of architectural vulnerability from microarchitectural state
Proceedings of the 34th annual international symposium on Computer architecture
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
Proceedings of the International Symposium on Code Generation and Optimization
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Fault-Tolerance CMP Architecture based on SMT Technology
IMSCCS '07 Proceedings of the Second International Multi-Symposiums on Computer and Computational Sciences
Quantifying software vulnerability
Proceedings of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies
Online Estimation of Architectural Vulnerability Factor for Soft Errors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A Redundancy Mechanism under Single Chip Multiprocessor Architecture
SEC '08 Proceedings of the 2008 Fifth IEEE International Symposium on Embedded Computing
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Engineering A Compiler
A compiler optimization to reduce soft errors in register files
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Computing and Minimizing Cache Vulnerability to Transient Errors
IEEE Design & Test
Cache vulnerability equations for protecting data in embedded processor caches from soft errors
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Protective redundancy overhead reduction using instruction vulnerability factor
Proceedings of the 7th ACM international conference on Computing frontiers
Characterizing the soft error vulnerability of multicores running multithreaded applications
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Continuously reducing transistor sizes and aggressive low power operating modes employed by modern architectures tend to increase transient error rates. Concurrently, multicore machines are dominating the architectural spectrum today in various application domains. These two trends require a fresh look at resiliency of multithreaded applications against transient errors from a software perspective. In this paper, we propose and evaluate a new metric called the Thread Vulnerability Factor (TVF). A distinguishing characteristic of TVF is that its calculation for a given thread (which is typically one of the threads of a multithreaded application) does not depend on its code alone, but also on the codes of the threads that share resources and data with that thread. As a result, we decompose TVF of a thread into two complementary parts: local and remote. While the former captures the TVF induced by the code of the target thread, the latter represents the vulnerability impact of the threads that interact with the target thread. We quantify the local and remote TVF values for three architectural components (register file, ALUs, and caches) using a set of ten multithreaded applications from the Parsec and Splash-2 benchmark suites. Our experimental evaluation shows that TVF values tend to increase as the number of cores increases, which means the system becomes more vulnerable as the core count rises. We further discuss how TVF metric can be employed to explore performance-reliability tradeoffs in multicores. Reliability-based analysis of compiler optimizations and redundancy-based fault tolerance are also mentioned as potential usages of our TVF metric.