Skewed redundancy

Authors:
Gordon B. Bell;Mikko H. Lipasti
Affiliations:
IBM, Research Triangle Park, NC, USA;University of Wisconsin - Madison, Madison, WI, USA
Venue:
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Year:
2008

Citing 29
Cited 0

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Improving processor performance by dynamically pre-processing the instruction stream

Improving processor performance by dynamically pre-processing the instruction stream
Beating in-order stalls with "flea-flicker" two-pass pipelining

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Hardware Memory Disambiguation for High-ILP Processors

IEEE Micro
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Understanding Scheduling Replay Schemes

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Detecting and exploiting causal relationships in hardware shared-memory multiprocessors

Detecting and exploiting causal relationships in hardware shared-memory multiprocessors
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Deconstructing commit

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reunion: Complexity-Effective Multicore Redundancy

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Configurable isolation: building high availability systems with commodity multi-core processors

Proceedings of the 34th annual international symposium on Computer architecture
Optimizing Dual-Core Execution for Power Efficiency and Transient-Fault Recovery

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional hardware thread or core. While such techniques can provide high fault coverage, they at best provide equivalent performance to the original execution and at worst incur a slowdown due to error checking, contention for shared resources, and synchronization overheads. This work achieves a similar goal of detecting transient errors by redundantly executing a program on an additional processor core, however it speeds up (rather than slows down) program execution compared to the unprotected baseline case. It makes the observation that a small number of instructions are detrimental to overall performance, and selectively skipping them enables one core to advance far ahead of the other to obtain prefetching and large instruction window benefits. We highlight the modest incremental hardware required to support skewed redundancy and demonstrate a speedup of 6%/54% for a collection of integer/floating point benchmarks while still providing 100% error detection coverage within our sphere of replication. Additionally, we show that a third core can further improve performance while adding error recovery capabilities.