The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A Theory of Fault-Tolerant Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
An axiomatic basis for computer programming
Communications of the ACM
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Microarchitecture and Design Challenges for Gigascale Integration
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Towards on-chip fault-tolerant communication
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Exploring Fault-Tolerant Network-on-Chip Architectures
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 43rd annual Design Automation Conference
A Generic Model for Formally Verifying NoC Communication Architectures: A Case Study
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Using the inter- and intra-switch regularity in NoC switch testing
Proceedings of the conference on Design, automation and test in Europe
Argus: Low-Cost, Comprehensive Error Detection in Simple Cores
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Immunet: Dependable Routing for Interconnection Networks with Arbitrary Topology
IEEE Transactions on Computers
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
Fault-tolerant architecture and deflection routing for degradable NoC switches
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Vicis: a reliable network for unreliable silicon
Proceedings of the 46th Annual Design Automation Conference
Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Fault tolerant network on chip switching with graceful performance degradation
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special issue on the 2009 ACM/IEEE international symposium on networks-on-chip
On the Effects of Process Variation in Network-on-Chip Architectures
IEEE Transactions on Dependable and Secure Computing
A resilience roadmap: (invited paper)
Proceedings of the Conference on Design, Automation and Test in Europe
A highly resilient routing algorithm for fault-tolerant NoCs
Proceedings of the Conference on Design, Automation and Test in Europe
A distributed and topology-agnostic approach for on-line NoC testing
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
An abacus turn model for time/space-efficient reconfigurable routing
Proceedings of the 38th annual international symposium on Computer architecture
Enabling system-level modeling of variation-induced faults in networks-on-chips
Proceedings of the 48th Design Automation Conference
ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Formally enhanced runtime verification to ensure NoC functional correctness
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ForEVeR: A complementary formal and runtime verification approach to correct NoC functionality
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
uDIREC: unified diagnosis and reconfiguration for frugal bypass of NoC faults
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
The widespread proliferation of the Chip Multi-Processor (CMP) paradigm has cemented the criticality of the on-chip interconnection fabric. The Network-on-Chip (NoC) is becoming increasingly susceptible to emerging reliability threats. As technology feature sizes diminish into the nanoscale regime, reliability and process variability artifacts within the NoC start to become prominent. The need to detect the occurrence of faults at run-time is steadily becoming imperative. In this work, we propose NoCAlert, a comprehensive on-line and real-time fault detection mechanism that demonstrates 0% false negatives within the interconnect, for the fault model and stimulus set used in this study. Based on the concept of invariance checking, NoCAlert employs a group of lightweight micro-checker modules that collectively implement real-time hardware assertions. The checkers operate seamlessly and concurrently with normal NoC operation, thus eliminating the need for periodic, or triggered-based, self-testing. More importantly, 97% of the faults are detected instantaneously. Extensive cycle-accurate simulations in a 64-node CMP demonstrate the efficacy of the proposed technique. Finally, hardware synthesis results using commercial 65 nm technology libraries indicate minimal area and power overhead of 3% and less than 1%, respectively, and negligible impact on the router's critical path.