Functional post-silicon diagnosis and debug for networks-on-chip

Authors:
Rawan Abdel-Khalek;Valeria Bertacco
Affiliations:
University of Michigan;University of Michigan
Venue:
Proceedings of the International Conference on Computer-Aided Design
Year:
2012

Citing 8
Cited 0

Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
An event-based network-on-chip monitoring service

HLDVT '04 Proceedings of the High-Level Design Validation and Test Workshop, 2004. Ninth IEEE International
A multi-core debug platform for NoC-based systems

Proceedings of the conference on Design, automation and test in Europe
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A Design-for-Debug (DfD) for NoC-Based SoC Debugging via NoC

ATS '08 Proceedings of the 2008 17th Asian Test Symposium
On-chip support for NoC-based SoC debugging

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Formally enhanced runtime verification to ensure NoC functional correctness

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Networks-on-chip (NoCs) have emerged as a favorable solution to provide higher bandwidth interconnects for large chip multiprocessors (CMPs). In order to enhance the inter-connect's performance, the NoC is often designed to include complex components and advanced features. Along with the increase in complexity and size, ensuring the functional correctness of the NoC can be particularly challenging This challenge pervades the entire verification effort, and particularly post-silicon validation, due to the lack of observability of the networks complex internal operation. We propose a post-silicon validation platform that enhances observability of network activity by periodically taking snapshots of the packets in flight. Each node's local cache is configured to store the snapshot logs in a temporary space allocated for post-silicon validation and released at deployment. Each snapshot log is periodically and locally analyzed by a software algorithm, running on the processor's core, in order to detect functional errors. If an error is detected, the snapshot logs are aggregated and additional debug data is extracted. This includes an overview of the traffic in the network at the time surrounding the manifestation of the error, as well as a partial reconstruction of the routes followed by the packets in flight. In our experiments, we found that this approach allows us to detect several types of functional errors, as well as observe over 50% of the network's traffic on average and reconstruct at least half of each of their routes through the network.