Logical effort: designing fast CMOS circuits
Logical effort: designing fast CMOS circuits
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Bochs: A Portable PC Emulator for Unix/X
Linux Journal
A new switch chip for IBM RS/6000 SP systems
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Worst-case traffic for oblivious routing functions
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
The Alpha 21364 Network Architecture
IEEE Micro
Routers with a single stage of buffering
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
A High-Throughput Distributed Shared-Buffer NoC Router
IEEE Computer Architecture Letters
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting inherent information redundancy to manage transient errors in NoC routing arbitration
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
A single-cycle output buffered router with layered switching for Networks-on-Chips
Computers and Electrical Engineering
Low power flitwise routing in an unidirectional torus with minimal buffering
Proceedings of the Fifth International Workshop on Network on Chip Architectures
Analytical modeling for multi-transaction bus on distributed systems
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Addressing network-on-chip router transient errors with inherent information redundancy
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Journal of Electronic Testing: Theory and Applications
Hi-index | 0.00 |
Router microarchitecture plays a central role in the performance of an on-chip network (NoC). Buffers are needed in routers to house incoming flits which cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, we propose a new router design that aims to emulate an OBR practically, based on a distributed shared-buffer (DSB) router architecture. We introduce innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow-control. We also present practical DSB configurations that can reduce the power overhead with negligible degradation in performance. The proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency by 60% on average for SPLASH-2 benchmarks with high contention, compared to a state-of-art pipelined IBR. On average, the saturation throughput of DSB routers is within 10% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.