An efficient, fully adaptive deadlock recovery scheme: DISHA
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks
ICPP '97 Proceedings of the international Conference on Parallel Processing
A Very Efficient Distributed Deadlock Detection Mechanism for Wormhole Networks
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Application of network calculus to general topologies using turn-prohibition
IEEE/ACM Transactions on Networking (TON)
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Analysis of Error Recovery Schemes for Networks on Chips
IEEE Design & Test
Complementary use of runtime validation and model checking
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Exploring Fault-Tolerant Network-on-Chip Architectures
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Assertion Checkers in Verification, Silicon Debug and In-Field Diagnosis
ISQED '07 Proceedings of the 8th International Symposium on Quality Electronic Design
A Generic Model for Formally Verifying NoC Communication Architectures: A Case Study
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Engineering trust with semantic guardians
Proceedings of the conference on Design, automation and test in Europe
Argus: Low-Cost, Comprehensive Error Detection in Simple Cores
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Router microarchitecture and scalability of ring topology in on-chip networks
Proceedings of the 2nd International Workshop on Network on Chip Architectures
Next generation on-chip networks: what kind of congestion control do we need?
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Functional correctness for CMP interconnects
ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
Functional post-silicon diagnosis and debug for networks-on-chip
Proceedings of the International Conference on Computer-Aided Design
NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Methods for fault tolerance in networks-on-chip
ACM Computing Surveys (CSUR)
uDIREC: unified diagnosis and reconfiguration for frugal bypass of NoC faults
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Post-silicon platform for the functional diagnosis and debug of networks-on-chip
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
As silicon technology scales, modern processors and embedded systems are rapidly shifting towards complex chip multi-processor (CMP) and system-on-chip (SoC) designs, comprising several processor cores and IP components communicating via a network-on-chip (NoC). As a side-effect of this trend, ensuring their correctness has become increasingly problematic. In particular, the network-on-chip often includes complex features and components to support the required communication bandwidth among the nodes in the system. In this landscape, it is no wonder that design errors in the NoC may go undetected and escape into the final silicon, with potential detrimental impact on the overall system. In this work, we propose ForEVeR, a solution that complements the use of formal methods and runtime verification to ensure functional correctness in NoCs. Formal verification, due to its scalability limitations, is used to verify the smaller modules, such as individual router components. We complete the protection against escaped design errors with a runtime technique, a network-level error detection and recovery solution, which monitors the traffic in the NoC and protects it against escaped functional bugs that affect the communication paths in the network. To this end, ForEVeR augments the baseline NoC with a lightweight checker network that alerts destination nodes of incoming packets ahead of time. If a bug is detected, flagged by missed packet arrivals, a recovery mechanism delivers the in-flight data safely to the intended destination via the checker network. ForEVeR's experimental evaluation shows that it can recover from NoC design errors at only 4.8% area cost for an 8x8 mesh interconnect, with a recovery performance cost of less than 30K cycles per functional bug manifestation. Additionally, it incurs no performance overhead in the absence of errors.