DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A fault model notation and error-control scheme for switch-to-switch buses in a network-on-chip
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
The Soft Error Problem: An Architectural Perspective
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Analysis of Error Recovery Schemes for Networks on Chips
IEEE Design & Test
Towards on-chip fault-tolerant communication
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Exploring Fault-Tolerant Network-on-Chip Architectures
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Efficient High Hamming Distance CRCs for Embedded Networks
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
Resilient dynamic power management under uncertainty
Proceedings of the conference on Design, automation and test in Europe
Multicast routing with dynamic packet fragmentation
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Fault-tolerant architecture and deflection routing for degradable NoC switches
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Dynamic packet fragmentation for increased virtual channel utilization in on-chip routers
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
A distributed and topology-agnostic approach for on-line NoC testing
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Benefits of selective packet discard in networks-on-chip
ACM Transactions on Architecture and Code Optimization (TACO)
A unified link-layer fault-tolerant architecture for network-based many-core embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Scaling of interconnects exacerbates the already challenging reliability of on-chip networks. Although many researchers have provided various fault handling techniques in chip multi-processors (CMPs), the fault-tolerance of the interconnection network is yet to adequately evolve. As an end-to-end recovery approach delays fault detection and complicates recovery to a consistent global state in such a system, a link-level retransmission is endorsed for recovery, making a higher-level protocol simple. In this paper, we introduce a fault-tolerant flow control scheme for soft error handling in on-chip networks. The fault-tolerant flow control recovers errors at a link-level by requesting retransmission and ensures an error-free transmission on a flit-basis with incorporation of dynamic packet fragmentation. Dynamic packet fragmentation is adopted as a part of fault-tolerant flow control to disengage flits from the fault-containment and recover the faulty flit transmission. Thus, the proposed router provides a high level of dependability at the link-level for both datapath and control planes. In simulation with injected faults, the proposed router is observed to perform well, gracefully degrading while exhibiting 97% error coverage in datapath elements. The proposed router has been implemented using a TSMC 45nm standard cell library. As compared to a router which employs triple modular redundancy (TMR) in datapath elements, the proposed router takes 58% less area and consumes 40% less energy per packet on average.