The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors
IEEE Transactions on Computers
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
The inevitable advent of the multi-core era has driven an increasing demand for low latency on-chip interconnection networks~(or NoCs). Being a critical part of the memory hierarchy for modern chip multi-processors~(CMPs), these networks face stringent design constraints to provide fast communication with tight power budget. Modern NoC's first-order concern is clearly its latency, while we also find that internal bandwidth of its routers is relatively plentiful; thus, we present a low latency router design utilizing a technique we call "multicast within a router" or McRouter, which allows productive utilization of remaining bandwidth inside a NoC router. McRouter allows a single cycle transfer of flits which shortens the communication latency when there is enough remaining bandwidth within the router. The key idea is to transmit a header flit to all possible output ports (multicast) so that it is always transmitted to the correct output port without relying on route computation. In addition, we find it is affordable with marginal power overhead while still being a stand-alone design by maintaining portability and modularity (unlike look-ahead routing based designs). Our evaluation with application traffic shows that McRouter helps achieving system speed-ups of 1.28, 1.17 and 1.05 over the conventional router~(CR), the VSA router~(VSAR) and the prediction router~(PR), respectively.