Notes on merging networks (Prelimiary Version)
STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Implementation analysis of NoC: a MPSoC trace-driven approach
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Evaluation of on-chip networks using deflection routing
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Reducing Packet Dropping in a Bufferless NoC
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
Allocator implementations for network-on-chip routers
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SCARAB: a single cycle adaptive routing and bufferless network
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low-cost router microarchitecture for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Evaluating Bufferless Flow Control for On-chip Networks
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
The aethereal network on chip after ten years: goals, evolution, lessons, and future
Proceedings of the 47th Design Automation Conference
Switch folding: network-on-chip routers with time-multiplexed output ports
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Bufferless switches can be an attractive and energy-efficient design option for on-chip networks when network utilization is low and low-latency operation matters the most. However, this promising design option is limited by the complexity of the control logic required to operate a bufferless switch that imposes large delays and limits the clock frequency. Pipelining is not an option in this low-latency environment. In this paper, we propose a new switch allocator for bufferless switches that parallelizes the steps required for achieving a match between requesting inputs and available outputs and offers significantly faster implementations.