Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The Odd-Even Turn Model for Adaptive Routing
IEEE Transactions on Parallel and Distributed Systems
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Communication in Multicomputers with Nonconvex Faults
IEEE Transactions on Computers
Fault-Tolerant Wormhole Routing in Meshes without Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
A Flexible Routing Scheme for Networks of Workstations
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability
IEEE Transactions on Parallel and Distributed Systems
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism
Proceedings of the 31st annual international symposium on Computer architecture
An Effective Methodology to Improve the Performance of the Up*/Down* Routing Algorithm
IEEE Transactions on Parallel and Distributed Systems
The Impact of Technology Scaling on Lifetime Reliability
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Analysis of Error Recovery Schemes for Networks on Chips
IEEE Design & Test
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori
IEEE Computer Architecture Letters
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Exploring Fault-Tolerant Network-on-Chip Architectures
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
Test Configurations for Diagnosing Faulty Links in NoC Switches
ETS '07 Proceedings of the 12th IEEE European Test Symposium
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
An Efficient and Deadlock-Free Network Reconfiguration Protocol
IEEE Transactions on Computers
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
IEEE Transactions on Computers
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Fault-tolerant architecture and deflection routing for degradable NoC switches
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Vicis: a reliable network for unreliable silicon
Proceedings of the 46th Annual Design Automation Conference
Fault-Tolerant Routing Algorithm for Network on Chip without Virtual Channels
DFT '09 Proceedings of the 2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Configurable links for runtime adaptive on-chip communication
Proceedings of the Conference on Design, Automation and Test in Europe
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and Tori
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer PCs
Proceedings of the sixth conference on Computer systems
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips
Proceedings of the 48th Design Automation Conference
ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
A systematic methodology to develop resilient cache coherence protocols
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Formally enhanced runtime verification to ensure NoC functional correctness
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Autonet: a high-speed, self-configuring local area network using point-to-point links
IEEE Journal on Selected Areas in Communications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Viper: virtual pipelines for enhanced reliability
Proceedings of the 39th Annual International Symposium on Computer Architecture
NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
As silicon continues to scale, transistor reliability is becoming a major concern. At the same time, increasing transistor counts are causing a rapid shift towards large chip multi-processors (CMP) and system-on-chip (SoC) designs, comprising several cores and IPs communicating via a network-on-chip (NoC). As the sole medium of on-chip communication, a NoC should gracefully tolerate many permanent faults. We propose uDIREC, a unified framework for permanent fault diagnosis and subsequent reconfiguration in NoCs that provides graceful performance degradation with increasing number of faults. Upon in-field transistor failures, uDIREC leverages a fine-resolution diagnosis mechanism to disable faulty components very sparingly. At its core, uDIREC employs a novel routing algorithm to find reliable and deadlock-free routes that utilize the still-functional links in the NoC. uDIREC places no restriction on topology, router architecture and number and location of faults. Experimental results show that uDIREC, implemented in a 64-node NoC, drops 3x fewer nodes and provides 25% higher throughput (beyond 15 faults) when compared to other state-of-the-art fault-tolerance solutions. uDIREC's improvement over prior-art grows with more faults, making it a suitable NoC reliability solution for a wide range of fault rates.