Computing Architectural Vulnerability Factors for Address-Based Structures
Proceedings of the 32nd annual international symposium on Computer Architecture
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
ExtraVirt: detecting and recovering from transient processor faults
Proceedings of the twentieth ACM symposium on Operating systems principles
Software-Based Transparent and Comprehensive Control-Flow Error Detection
Proceedings of the International Symposium on Code Generation and Optimization
A Simulation-Based Soft Error Estimation Methodology for Computer Systems
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Operating system issues for petascale systems
ACM SIGOPS Operating Systems Review
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Software-Based Adaptive and Concurrent Self-Testing in Programmable Network Interfaces
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Evaluating instruction cache vulnerability to transient errors
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Architecting a reliable CMP switch architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting soft redundancy for error-resilient on-chip memory design
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
Proceedings of the International Symposium on Code Generation and Optimization
Modeling and improving data cache reliability: 1
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A low-SER efficient core processor architecture for future technologies
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
The visual vulnerability spectrum: characterizing architectural vulnerability for graphics hardware
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Zyzzyva: speculative byzantine fault tolerance
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Software-Based Failure Detection and Recovery in Programmable Network Interfaces
IEEE Transactions on Parallel and Distributed Systems
Evaluating instruction cache vulnerability to transient errors
ACM SIGARCH Computer Architecture News
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Nysiad: practical protocol transformation to tolerate Byzantine failures
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Applying Safety Goals to a New Intensive Care Workstation System
SAFECOMP '08 Proceedings of the 27th international conference on Computer Safety, Reliability, and Security
Scalable and reliable communication for hardware transactional memory
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Robust FPGA resynthesis based on fault-tolerant Boolean matching
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Improving error tolerance for multithreaded register files
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Adopting the Drowsy Technique for Instruction Caches: A Soft Error Perspective
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Self-recovery in server programs
Proceedings of the 2009 international symposium on Memory management
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
A hybrid nano-CMOS architecture for defect and fault tolerance
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Improving cache lifetime reliability at ultra-low voltages
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting memory soft redundancy for joint improvement of error tolerance and access efficiency
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Power-efficient, reliable microprocessor architectures: modeling and design methods
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs
HiPC'08 Proceedings of the 15th international conference on High performance computing
Use ECP, not ECC, for hard failures in resistive memories
Proceedings of the 37th annual international symposium on Computer architecture
Fault-Tolerant Flow Control in On-chip Networks
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Modeling soft errors for data caches and alleviating their effects on data reliability
Microprocessors & Microsystems
Replica victim caching to improve cache reliability against transient errors
International Journal of High Performance Systems Architecture
Multicore soft error rate stabilization using adaptive dual modular redundancy
Proceedings of the Conference on Design, Automation and Test in Europe
Design techniques for cross-layer resilience
Proceedings of the Conference on Design, Automation and Test in Europe
Performance-asymmetry-aware scheduling for Chip Multiprocessors with static core coupling
Journal of Systems Architecture: the EUROMICRO Journal
OE+IOE: a novel turn model based fault tolerant routing scheme for networks-on-chip
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
DRAM errors in the wild: a large-scale field study
Communications of the ACM
Fault-tolerant resynthesis with dual-output LUTs
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Feedback control based cache reliability enhancement for emerging multicores
Proceedings of the International Conference on Computer-Aided Design
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization (TACO)
On power and fault-tolerance optimization in FPGA physical synthesis
Proceedings of the International Conference on Computer-Aided Design
RISE: improving the streaming processors reliability against soft errors in gpgpus
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Journal of Computer and System Sciences
CEP: Correlated Error Propagation for Hierarchical Soft Error Analysis
Journal of Electronic Testing: Theory and Applications
Cost-effective soft-error protection for SRAM-based structures in GPGPUs
Proceedings of the ACM International Conference on Computing Frontiers
Memory array protection: check on read or check on write?
Proceedings of the Conference on Design, Automation and Test in Europe
ACR: automatic checkpoint/restart for soft and hard error protection
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
This paper gives a broad overview of the soft error problem from an architectural perspective. We start with basic definitions followed by a description of techniques to compute the softerror rate. Then, we summarize techniques used to reduce the soft error rate. Finally, we outline future directions for architecture research in soft errors.