IEEE Transactions on Computers
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Asim: A Performance Model Framework
Computer
IBM's S/390 G5 Microprocessor Design
IEEE Micro
The Alpha 21364 Network Architecture
IEEE Micro
Concurrent Error Detection Using Watchdog Processors-A Survey
IEEE Transactions on Computers
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Incorporating fault tolerance in superscalar processors
HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Proceedings of the 31st annual international symposium on Computer architecture
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A Case for Clumsy Packet Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Logic soft errors in sub-65nm technologies design and CAD challenges
Proceedings of the 42nd annual Design Automation Conference
Design and Evaluation of Hybrid Fault-Detection Systems
Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance/Watt: the new server focus
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Software-controlled fault tolerance
ACM Transactions on Architecture and Code Optimization (TACO)
Static typing for a faulty lambda calculus
Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Architecting a reliable CMP switch architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for bounding vulnerabilities of processor structures
Proceedings of the 34th annual international symposium on Computer architecture
Dynamic prediction of architectural vulnerability from microarchitectural state
Proceedings of the 34th annual international symposium on Computer architecture
Online diagnosis of hard faults in microprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Fault-tolerant typed assembly language
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
Proceedings of the International Symposium on Code Generation and Optimization
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
The visual vulnerability spectrum: characterizing architectural vulnerability for graphics hardware
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Reliability-aware Co-synthesis for Embedded Systems
Journal of VLSI Signal Processing Systems
Hierarchical Verification for Increasing Performance in Reliable Processors
Journal of Electronic Testing: Theory and Applications
Dependability, power, and performance trade-off on a multicore processor
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Communications of the ACM - Web science
Efficient fault tolerance in multi-media applications through selective instruction replication
Proceedings of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies
Globally optimized robust systems to overcome scaled CMOS reliability challenges
Proceedings of the conference on Design, automation and test in Europe
Techniques for Efficient Software Checking
Languages and Compilers for Parallel Computing
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Compiler-assisted soft error detection under performance and energy constraints in embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Sequential element design with built-in soft error resilience
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
End-to-end register data-flow continuous self-test
Proceedings of the 36th annual international symposium on Computer architecture
System-on-Chip Test Architectures: Nanometer Design for Testability
System-on-Chip Test Architectures: Nanometer Design for Testability
REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Selective replication: A lightweight technique for soft errors
ACM Transactions on Computer Systems (TOCS)
On soft error rate analysis of scaled CMOS designs: a statistical perspective
Proceedings of the 2009 International Conference on Computer-Aided Design
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Energy-efficient redundant execution for chip multiprocessors
Proceedings of the 20th symposium on Great lakes symposium on VLSI
A compiler-based infrastructure for fault-tolerant co-design
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Relax: an architectural framework for software recovery of hardware faults
Proceedings of the 37th annual international symposium on Computer architecture
DAFT: decoupled acyclic fault tolerance
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamic processors demand dynamic operating systems
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
Proceedings of the Conference on Design, Automation and Test in Europe
Design techniques for cross-layer resilience
Proceedings of the Conference on Design, Automation and Test in Europe
Multiplexed redundant execution: a technique for efficient fault tolerance in chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Performance-asymmetry-aware scheduling for Chip Multiprocessors with static core coupling
Journal of Systems Architecture: the EUROMICRO Journal
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Microprocessors & Microsystems
Resilience of mutual exclusion algorithms to transient memory faults
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Sampling + DMR: practical and low-overhead permanent fault detection
Proceedings of the 38th annual international symposium on Computer architecture
Multicore performance optimization using partner cores
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A fault-tolerant, dynamically scheduled pipeline structure for chip multiprocessors
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
Statistical Soft Error Rate (SSER) Analysis for Scaled CMOS Designs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Soft core based embedded systems in critical aerospace applications
Journal of Systems Architecture: the EUROMICRO Journal
Trade-offs in transient fault recovery schemes for redundant multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Reliability of a softcore processor in a commercial SRAM-based FPGA
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Resource-Driven optimizations for transient-fault detecting superscalar microarchitectures
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Exploiting inactive rename slots for detecting soft errors
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Reliable computing with ultra-reduced instruction set co-processors
Proceedings of the 49th Annual Design Automation Conference
Efficient soft error protection for commodity embedded microprocessors using profile information
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Runtime asynchronous fault tolerance via speculation
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Setting an error detection infrastructure with low cost acoustic wave detectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Fast online error detection and correction with thread signature calculae
Microprocessors & Microsystems
Detection and correction of silent data corruption for large-scale high-performance computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Warped-DMR: Light-weight Error Detection for GPGPU
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Self-Adaptive Fault Tolerance in Multi-/Many-Core Systems
Journal of Electronic Testing: Theory and Applications
Fault tolerance for multi-threaded applications by leveraging hardware transactional memory
Proceedings of the ACM International Conference on Computing Frontiers
FaulTM: error detection and recovery using hardware transactional memory
Proceedings of the Conference on Design, Automation and Test in Europe
A work-stealing scheduling framework supporting fault tolerance
Proceedings of the Conference on Design, Automation and Test in Europe
Low cost permanent fault detection using ultra-reduced instruction set co-processors
Proceedings of the Conference on Design, Automation and Test in Europe
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An adaptive approach for online fault management in many-core architectures
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Virtually-aged sampling DMR: unifying circuit failure prediction and circuit failure detection
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
Exponential growth in the number of on-chip transistors, coupled with reductions in voltage levels, makes each generation of microprocessors increasingly vulnerable to transient faults. In a multithreaded environment, we can detect these faults by running two copies of the same program as separate threads, feeding them identical inputs, and comparing their outputs, a technique we call Redundant Multithreading (RMT).This paper studies RMT techniques in the context of both single- and dual-processor simultaneous multithreaded (SMT) single-chip devices. Using a detailed, commercial-grade, SMT processor design we uncover subtle RMT implementation complexities, and find that RMT can be a more significant burden for single-processor devices than prior studies indicate. However, a novel application of RMT techniques in a dual-processor device, which we term chip-level redundant threading (CRT), shows higher performance than lockstepping the two cores, especially on multithreaded workloads.