Post-silicon platform for the functional diagnosis and debug of networks-on-chip

Authors:
Rawan Abdel-Khalek;Valeria Bertacco
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Year:
2014

Citing 28
Cited 0

Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Postsilicon Validation Methodology for Microprocessors

IEEE Design & Test
Test and Debug Strategy of the PNX8525 Nexperia" Digital Video Platform System Chip

ITC '01 Proceedings of the 2001 IEEE International Test Conference
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
A reconfigurable design-for-debug infrastructure for SoCs

Proceedings of the 43rd annual Design Automation Conference
Resource-Efficient Routing and Scheduling of Time-Constrained Network-on-Chip Communication

DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
Bounded arbitration algorithm for QoS-supported on-chip communication

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
An event-based network-on-chip monitoring service

HLDVT '04 Proceedings of the High-Level Design Validation and Test Workshop, 2004. Ninth IEEE International
A multi-core debug platform for NoC-based systems

Proceedings of the conference on Design, automation and test in Europe
In-System Silicon Validation and Debug

IEEE Design & Test
Automated trace signals identification and state restoration for improving observability in post-silicon validation

Proceedings of the conference on Design, automation and test in Europe
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A Design-for-Debug (DfD) for NoC-Based SoC Debugging via NoC

ATS '08 Proceedings of the 2008 17th Asian Test Symposium
Transaction-Aware Network-on-Chip Resource Reservation

IEEE Computer Architecture Letters
Automated Selection of Signals to Observe for Efficient Silicon Debug

VTS '09 Proceedings of the 2009 27th IEEE VLSI Test Symposium
A trace-capable instruction cache for cost efficient real-time program trace compression in SoC

Proceedings of the 46th Annual Design Automation Conference
Post-silicon bug localization in processors using instruction footprint recording and analysis (IFRA)

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
BLoG: post-silicon bug localization in processors using bug localization graphs

Proceedings of the 47th Design Automation Conference
On-chip support for NoC-based SoC debugging

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Cache aware compression for processor debug support

Proceedings of the Conference on Design, Automation and Test in Europe
Trace signal selection for visibility enhancement in post-silicon validation

Proceedings of the Conference on Design, Automation and Test in Europe
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Compressing Cache State for Postsilicon Processor Debug

IEEE Transactions on Computers
FlexiBuffer: reducing leakage power in on-chip network routers

Proceedings of the 48th Design Automation Conference
Fast Verification of Memory Consistency for Chip Multi-Processor

CIS '11 Proceedings of the 2011 Seventh International Conference on Computational Intelligence and Security
Simulation-based signal selection for state restoration in silicon debug

Proceedings of the International Conference on Computer-Aided Design
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Formally enhanced runtime verification to ensure NoC functional correctness

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing number of units in today's systems-on-chip and multicore processors has led to complex intra-chip communication solutions. Specifically, Networks-on-Chip (NoCs) have emerged as a favorable fabric to provide high bandwidth and low latency in connecting many units in a same chip. To achieve these goals, the NoC often includes complex components and advanced features, leading to the development of large and highly complex interconnect subsystems. One of the biggest challenges in these designs is to ensure the correct functionality of this communication infrastructure. To support this goal, an increasing fraction of the validation effort has shifted to post-silicon validation, because it permits exercising network activities that are too complex to be validated in pre-silicon. However, post-silicon validation is hindered by the lack of observability of the network's internal operations and thus, diagnosing functional errors during this phase is very difficult. In this work, we propose a post-silicon validation platform that improves observability of network operations by taking periodic snapshots of the traffic traversing the network. Each node's local cache is configured to temporarily store the snapshot logs in a designated area reserved for post-silicon validation and relinquished after product release. Each snapshot log is analyzed locally by a software algorithm running on its corresponding core, in order to detect functional errors. Upon error detection, all snapshot logs are aggregated at a central location to extract additional debug data, including an overview of network traffic surrounding the error event, as well as a partial reconstruction of the routes followed by packets in flight at the time. In our experiments, we found that this approach allows us to detect several types of functional errors, as well as observe, on average, over 50% of the network's traffic and reconstruct at least half of each of their routes through the network.