VLSI micro-architectures for high-radix crossbar schedulers

Authors:
Giorgos Passas;Manolis Katevenis;Dionisios Pnevmatikatos
Affiliations:
Institute of Computer Science (ICS), Foundation for Research and Technology -- Hellas (FORTH), Heraklion, Crete, Greece;Institute of Computer Science (ICS), Foundation for Research and Technology -- Hellas (FORTH), Heraklion, Crete, Greece;Institute of Computer Science (ICS), Foundation for Research and Technology -- Hellas (FORTH), Heraklion, Crete, Greece
Venue:
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Year:
2011

Citing 11
Cited 0

High-speed switch scheduling for local-area networks

ACM Transactions on Computer Systems (TOCS)
The iSLIP scheduling algorithm for input-queued switches

IEEE/ACM Transactions on Networking (TON)
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Designing and Implementing a Fast Crossbar Scheduler

IEEE Micro
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Symmetric Crossbar Arbiters for VLSI Communication Switches

IEEE Transactions on Parallel and Distributed Systems
Microarchitecture of a High-Radix Router

Proceedings of the 32nd annual international symposium on Computer Architecture
Towards an efficient switch architecture for high-radix switches

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
A Regular Layout for Parallel Adders

IEEE Transactions on Computers
Bringing NoCs to 65 nm

IEEE Micro
A 128 x 128 x 24Gb/s Crossbar Interconnecting 128 Tiles in a Single Hop and Occupying 6% of Their Area

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the scaling of parallel-matching crossbar schedulers to radices above 100. First, we examine a traditional microarchitecture that implements the matching decision of each input and each output of the crossbar in a separate arbiter block and communicates the matching decisions between the input and the output arbiters through global point-to-point links. Using simple models and experimentation with 90nm CMOS layouts, we show that this architecture is expensive because the global point-to-point links take up O(N4) area, where N the radix of the crossbar. Next, by observing that the wiring of an arbiter fits in a minimal O(NlogN) area, we propose a novel microarchitecture that inverts the locality of wires by orthogonally interleaving the input with the output arbiters, thus lowering the wiring area of the scheduler down to O(N2log2N). Using this architecture, the scheduler for a radix-128 FIFO, VOQ, or 2-VC crossbar becomes gate limited, fitting in 3.6, 7.2, and 7.2mm2 respectively, which is a 40, 50, and 70% improvement compared to the traditional. Moreover, the proposed schedulers find a new match in less than 10ns, thus allowing a minimum packet below 30Bytes at 24Gb/s line rate. Based on these findings, we conclude that crossbar schedulers are feasible even for radices above 100.