Fault-tolerant computer system design
Fault-tolerant computer system design
Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis
IEEE Transactions on Parallel and Distributed Systems
Soft-Error Detection through Software Fault-Tolerance Techniques
DFT '99 Proceedings of the 14th International Symposium on Defect and Fault-Tolerance in VLSI Systems
A Software Fault Tolerance Method for Safety-Critical Systems: Effectiveness and Drawbacks
Proceedings of the 15th symposium on Integrated circuits and systems design
Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies
VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Proceedings of the 42nd annual Design Automation Conference
Neutron SER Characterization of Microprocessors
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Exploring Fault-Tolerant Network-on-Chip Architectures
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Prototyping a fault-tolerant multiprocessor SoC with run-time fault recovery
Proceedings of the 43rd annual Design Automation Conference
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Verification-guided soft error resilience
Proceedings of the conference on Design, automation and test in Europe
Reliable Network-on-Chip Using a Low Cost Unequal Error Protection Code
DFT '07 Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems
Crosstalk- and SEU-Aware Networks on Chips
IEEE Design & Test
A Low-Power and SEU-Tolerant Switch Architecture for Network on Chips
PRDC '07 Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing
Globally optimized robust systems to overcome scaled CMOS reliability challenges
Proceedings of the conference on Design, automation and test in Europe
Soft-error resilience of the IBM POWER6 processor input/output subsystem
IBM Journal of Research and Development
A case study of on-chip sensor network in multiprocessor system-on-chip
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Analysis and optimization of fault-tolerant embedded systems with hardened processors
Proceedings of the Conference on Design, Automation and Test in Europe
Satisfiability Modulo Graph Theory for Task Mapping and Scheduling on Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
A NoC Traffic Suite Based on Real Applications
ISVLSI '11 Proceedings of the 2011 IEEE Computer Society Annual Symposium on VLSI
Error Tolerance in Server Class Processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
As transistor density continues to increase with the advent of nanotechnology, reliability issues raised by the more frequent appearance of soft errors are becoming critical for future embedded multiprocessor systems design. State-of-the-art techniques for soft error protections targeting multiprocessor systems result either high chip cost and area overhead or high performance degradation and energy consumption, and do not fulfill the increasing requirements for high performance and dependability. In this article we present a systematic approach, that is, the Sensor Networks-on-Chip (SENoC), to collaboratively and efficiently manage on-chip applications and overcome reliability threats to Multiprocessor Systems-on-Chip (MPSoC). A hardware-software collaborative approach is proposed to solve soft error problems: a hardware-based on-chip sensor network is built for soft error detection, and a software-based recovery mechanism is applied for soft error correction. A two-step scheduling scheme is presented for reliable application and chip management, combining an off-line static optimization stage for application performance maximization and an online lightweight dynamic adjustment stage to handle runtime variations and exceptions. This strategy introduces only trivial overhead on hardware design and much lower overhead on software control and execution, and hence performance degradation and energy consumption is greatly reduced. We build a cycle-accurate simulator using SystemC, and verify the effectiveness of our technique by comparing performance with related techniques on several real-world applications.