A New Approach to Fault-Tolerant Wormhole Routing for Mesh-Connected Parallel Computers
IEEE Transactions on Computers
Microarchitecture and Design Challenges for Gigascale Integration
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori
IEEE Computer Architecture Letters
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Exploring Fault-Tolerant Network-on-Chip Architectures
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Reliability challenges for 45nm and beyond
Proceedings of the 43rd annual Design Automation Conference
On-line Fault Detection and Location for NoC Interconnects
IOLTS '06 Proceedings of the 12th IEEE International Symposium on On-Line Testing
Online NoC Switch Fault Detection and Diagnosis Using a High Level Fault Model
DFT '07 Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems
IEEE Transactions on Computers
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adaptive stochastic routing in fault-tolerant on-chip networks
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Fault-Tolerant Flow Control in On-chip Networks
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A highly resilient routing algorithm for fault-tolerant NoCs
Proceedings of the Conference on Design, Automation and Test in Europe
NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A resilient architecture for low latency communication in shared-L1 processor clusters
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Methods for fault tolerance in networks-on-chip
ACM Computing Surveys (CSUR)
Online traffic-aware fault detection for networks-on-chip
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
A new distributed on-line test mechanism for NoCs is proposed which scales to large-scale networks with general topologies and routing algorithms. Each router and its links are tested using neighbors in different phases. Only the router under test is in test mode and all other parts of the NoC are in functional mode. Experimental results show that our on-line test approach can detect stuck-at and short-wire faults in the routers and links. Our approach achieves 100% fault coverage for the data-path and 85% for the control paths including routing logic, FIFO's control path and the arbiter of a 5x5 router. Synthesis results show that the hardware overhead of our test components with TMR (Triple Module Redundancy) support is 20% for covering both stuck-at and short-wire faults and 7% for covering only stuck-at faults in the 5x5 router. Simulation results show that our on-line testing approach has an average latency overhead of 20% and 3% in synthetic traffic and PARSEC traffic benchmarks on an 8x8 NoC, respectively.