Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Replacing global wires with an on-chip network: a power analysis
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Physical vs. Virtual Express Topologies with Low-Swing Links for Future Many-Core NoCs
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the Conference on Design, Automation and Test in Europe
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Analytical performance modeling of shuffle-exchange inspired mesh-based Network-on-Chips
Performance Evaluation
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
In this paper, we present a case study of our chip prototype of a 16-node 4x4 mesh NoC fabricated in 45nm SOI CMOS that aims to simultaneously optimize energy-latency-throughput for unicasts, multicasts and broadcasts. We first define and analyze the theoretical limits of a mesh NoC in latency, throughput and energy, then describe how we approach these limits through a combination of microarchitecture and circuit techniques. Our 1.1V 1GHz NoC chip achieves 1-cycle router-and-link latency at each hop and energy-efficient router-level multicast support, delivering 892Gb/s (87.1% of the theoretical bandwidth limit) at 531.4mW for a mixed traffic of unicasts and broadcasts. Through this fabrication, we derive insights that help guide our research, and we believe, will also be useful to the NoC and multicore research community.