Analysis of Error Recovery Schemes for Networks on Chips
IEEE Design & Test
Reliable Network-on-Chip Using a Low Cost Unequal Error Protection Code
DFT '07 Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems
Yield and Cost Analysis of a Reliable NoC
VTS '09 Proceedings of the 2009 27th IEEE VLSI Test Symposium
Vicis: a reliable network for unreliable silicon
Proceedings of the 46th Annual Design Automation Conference
On hamming product codes with type-II hybrid ARQ for on-chip interconnects
IEEE Transactions on Circuits and Systems Part I: Regular Papers
The use of triple-modular redundancy to improve computer reliability
IBM Journal of Research and Development
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Design of a High-Throughput Distributed Shared-Buffer NoC Router
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Transient and Permanent Error Co-management Method for Reliable Networks-on-Chip
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Self-adaptive system for addressing permanent errors in on-chip interconnects
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A highly resilient routing algorithm for fault-tolerant NoCs
Proceedings of the Conference on Design, Automation and Test in Europe
Optimizing power and performance for reliable on-chip networks
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Coding for system-on-chip networks: a unified framework
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Error control schemes for on-chip communication links: the energy-reliability tradeoff
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Low-cost fault-tolerant switch allocator for network-on-chip routers
Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
Methods for fault tolerance in networks-on-chip
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We exploit the inherent information redundancy in the control path of Networks-on-Chip (NoCs) routers to manage transient errors, preventing packet loss and misrouting. Unlike fault-tolerant routing, our method does not drop packets when faults occur in routers and thus does not increase the burden on neighboring routers. Unlike the NoC interconnect links, the routing operation is nonlinear and standard error control coding methods cannot be used. Instead, our method exploits existing information redundancy in the router, significantly reducing the area overhead and power consumption compared to triple-modular redundancy (TMR). An analytical reliability model of our method is provided, including parameters such as circuit size, different error rates for logic gates and registers, and the location of a faulty element. Compared to TMR, the proposed method improves the arbiter reliability by two orders of magnitude while reducing the total power and area by 43% and 64%, respectively. Simulations performed on a 4x4 NoC show that our method reduces the average latency by up to 90% and 12% over no-protection and TMR methods, respectively.