Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Flattened Butterfly Topology for On-Chip Networks
IEEE Computer Architecture Letters
Clock distribution scheme using coplanar transmission lines
Proceedings of the conference on Design, automation and test in Europe
A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A reconfigurable source-synchronous on-chip network for GALS many-core platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special issue on the 2009 ACM/IEEE international symposium on networks-on-chip
Interconnected Tile Standing Wave Resonant Oscillator Based Clock Distribution Circuits
VLSID '11 Proceedings of the 2011 24th International Conference on VLSI Design
ACM SIGARCH Computer Architecture News
Architectural simulations of a fast, source-synchronous ring-based Network-on-Chip design
ICCD '12 Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
Hi-index | 0.00 |
Most existing Network-on-Chip (NoC) designs operate at the same or lower clock speed as the processing elements (PEs). Recently, a new source-synchronous ring-based NoC architecture has been proposed, which runs significantly faster than the PEs and offers a significantly higher bandwidth and lower communication latency. However, the ring-based design assumes a separate clock distribution scheme for the NoC and the PEs, and uses a standard mesh topology for the NoC. In this work, we present a source synchronous ring-based NoC, laid out in an H-tree topology, with each data link being routed parallel to a clock ring. The clock is generated and distributed by multiple standing wave oscillator (SWO) rings, which are also laid out in an H-tree topology. Our design allows the PEs to extract a low jitter clock directly from the high speed ring-based SWO clock by division. Moreover, since the PEs are synchronous with the ring clock, they do not need synchronizers while communicating with the NoC. We also show that by recursively duplicating links in the H-tree based source synchronous NoC (Hnoc), we can obtain new hybrid NoC structures. In the limit, this recursive duplication causes the H-tree based NoC to morph into the meshbased source synchronous NoC (Mnoc). The performance of each such intermediate hybrid NoC structure is quantified in terms of area, link utilization and contention free latency. We also enhance the performance of the hybrid NoCs by widening congested links, and quantify the tradeoffs. Experimental results show that the hybrid NoC designs can provide significantly lower latency (upto 5× lower) and are able to sustain a higher injection rate (upto 6.8× higher) compared to a state of the art mesh. Moreover, these hybrid NoC designs use fewer buffers (upto 19.4% less) and lower wire length (upto 19.7% lower) compared to a mesh. Based on the performance and the area tradeoffs, an NoC designer can select any hybrid NoC structure among the presented.