Performance-reliability tradeoff analysis for multithreaded applications

Authors:
Isil Oz;Haluk Rahmi Topcuoglu;Mahmut Kandemir;Oguz Tosun
Affiliations:
Bogazici University, Istanbul, Turkey;Marmara University, Istanbul, Turkey;Pennsylvania State University, University Park, PA;Bogazici University, Istanbul, Turkey
Venue:
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2012

Citing 16
Cited 0

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Understanding Molecular Simulation

Understanding Molecular Simulation
Simics: A Full System Simulation Platform

Computer
The Scalability of FFT on Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
SPLASH: Stanford parallel applications for shared-memory

SPLASH: Stanford parallel applications for shared-memory
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
SWIFT: Software Implemented Fault Tolerance

Proceedings of the international symposium on Code generation and optimization
Implicit and explicit optimizations for stencil computations

Proceedings of the 2006 workshop on Memory system performance and correctness
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Measuring Multicore Performance

Computer
Mixed-mode multicore reliability

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
A Multilevel Parallelization Framework for High-Order Stencil Computations

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Quantifying Thread Vulnerability for Multicore Architectures

PDP '11 Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern architectures become more susceptible to transient errors with the scale down of circuits. This makes reliability an increasingly critical concern in computer systems. In general, there is a tradeoff between system reliability and performance of multithreaded applications running on multicore architectures. In this paper, we conduct a performance-reliability analysis for different parallel versions of three data-intensive applications including FFT, Jacobi Kernel, and Water Simulation. We measure the performance of these programs by counting execution clock cycles, while the system reliability is measured by Thread Vulnerability Factor (TVF) which is a recently-proposed metric. TVF measures the vulnerability of a thread to hardware faults at a high level. We carry out experiments by executing parallel implementations on multicore architectures and collect data about the performance and vulnerability. Our experimental evaluation indicates that the choice is clear for FFT application and Jacobi Kernel. Transpose algorithm for FFT application results in less than 5% performance loss while the vulnerability increases by 20% compared to binary-exchange algorithm. Unrolled Jacobi code reduces execution time up to 50% with no significant change on vulnerability values. However, the tradeoff is more interesting for Water Simulation where nsquared version reduces the vulnerability values significantly by worsening the performance with similar rates compared to faster but more vulnerable spatial version.