Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Accelerated testing for cosmic soft-error rate
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Probability and statistics with reliability, queuing and computer science applications
Probability and statistics with reliability, queuing and computer science applications
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
IBM's S/390 G5 Microprocessor Design
IEEE Micro
Power4 System Design for High Reliability
IEEE Micro
FIAT: A Framework for Interprocedural Analysis and Transfomation
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Speculative Sequential Consistency with Little Custom Storage
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
The architecture of Tandem's NonStop system
ACM '81 Proceedings of the ACM '81 conference
Rules of Thumb in Data Engineering
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Early Error Detection in Systems-on-Chip for Fault-Tolerance and At-Speed Debugging
VTS '01 Proceedings of the 19th IEEE VLSI Test Symposium
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
ReStore: Symptom-Based Soft Error Detection in Microprocessors
IEEE Transactions on Dependable and Secure Computing
Ultra low-cost defect protection for microprocessor pipelines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Architecting a reliable CMP switch architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic prediction of architectural vulnerability from microarchitectural state
Proceedings of the 34th annual international symposium on Computer architecture
Transient fault prediction based on anomalies in processor events
Proceedings of the conference on Design, automation and test in Europe
Implementing high availability memory with a duplication cache
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Synchronizing redundant cores in a dynamic DMR multicore architecture
IEEE Transactions on Circuits and Systems II: Express Briefs
Energy-efficient redundant execution for chip multiprocessors
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Multiplexed redundant execution: a technique for efficient fault tolerance in chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Analysis of checksum-based execution schemes for pipelined processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A resilient on-chip router design through data path salvaging
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Characterizing the impact of soft errors on iterative methods in scientific computing
Proceedings of the international conference on Supercomputing
Sampling + DMR: practical and low-overhead permanent fault detection
Proceedings of the 38th annual international symposium on Computer architecture
Cost-effective safety and fault localization using distributed temporal redundancy
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Trade-offs in transient fault recovery schemes for redundant multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
An adaptive low-overhead mechanism for dependable general-purpose many-core processors
ICT-EurAsia'13 Proceedings of the 2013 international conference on Information and Communication Technology
Fault tolerance for multi-threaded applications by leveraging hardware transactional memory
Proceedings of the ACM International Conference on Computing Frontiers
FaulTM: error detection and recovery using hardware transactional memory
Proceedings of the Conference on Design, Automation and Test in Europe
A survey of checker architectures
ACM Computing Surveys (CSUR)
Building timing predictable embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Recent studies have suggested that the soft-error rate in microprocessor logic will become a reliability concern by 2010. This paper proposes an efficient error detection technique, called fingerprinting, that detects differences in execution across a dual modular redundant (DMR) processor pair. Fingerprinting summarizes a processor's execution history in a hash-based signature; differences between two mirrored processors are exposed by comparing their fingerprints. Fingerprinting tightly bounds detection latency and greatly reduces the interprocessor communication bandwidth required for checking. This paper presents a study that evaluates fingerprinting against a range of current approaches to error detection. The result of this study shows that fingerprinting is the only error detection mechanism that simultaneously allows high-error coverage, low error detection bandwidth, and high I/O performance.