Fast parallel prefix logic circuits for n2n round-robin arbitration

Authors:
H. Fatih Ugurdag;Onur Baskirt
Affiliations:
Department of Electrical & Electronics Engineering , Ozyegin University, Nisantepe 34782 Cekmekoy, Istanbul, Turkey;Ericsson, ITU Ari-2 Techopark, 34390 Maslak, Istanbul, Turkey
Venue:
Microelectronics Journal
Year:
2012

Citing 12
Cited 0

The iSLIP scheduling algorithm for input-queued switches

IEEE/ACM Transactions on Networking (TON)
Parallel Prefix Computation

Journal of the ACM (JACM)
On the relevance of wire load models

Proceedings of the 2001 international workshop on System-level interconnect prediction
Round-robin arbiter design and generation

Proceedings of the 15th international symposium on System Synthesis
Tiny Tera: A Packet Switch Core

IEEE Micro
Designing and Implementing a Fast Crossbar Scheduler

IEEE Micro
A Family of Adders

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Algorithm-Hardware Codesign of Fast Parallel Round-Robin Arbiters

IEEE Transactions on Parallel and Distributed Systems
A Regular Layout for Parallel Adders

IEEE Transactions on Computers
Practical High-Throughput Crossbar Scheduling

IEEE Micro
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

IEEE Transactions on Computers
Model-Driven Design and Generation of New Multi-Facet Arbiters: From the Design Model to the Hardware Synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An n2n round-robin arbiter (RRA) searches its n inputs for a 1, starting from the highest-priority input. It picks the first 1 and outputs its index in one-hot encoding. RRA aims to be fair to its inputs and maintains fairness by simply rotating the input priorities, i.e., the last arbitrated input becomes the lowest-priority input. Arbiters are used to multiplex the usage of shared resources among requestors as well as in dispatch logic where the purpose is load balancing among multiple resources. Today, arbiters have hundreds of ports and usually need to run at very high clock speeds. This article presents a new gate-level RRA circuit called Thermo Coded-Parallel Prefix Arbiter (TC-PPA) that scales to any number of requestors. It uses parallel prefix network topologies (borrowed from fast carry lookahead adders) to generate a thermometer-coded pointer, thus greatly reducing critical path. Code generators were written not only for TC-PPA but also for the 5 highly competitive circuits in the literature (9 including their variants), and a rich set of timing/area results were obtained using a standard-cell based logic synthesis flow with a novel iterative strategy based on binary search. Synthesis runs include results with wire-load and without. Results show that for 54 or more ports (except 256) TC-PPA offers the best timing (lowest latency) as well as competitive area. Contributions also include transaction-level simulations that show when pipelining is used to boost clock rate, latency and input FIFO sizes are adversely affected, and hence pipelining cannot be indiscriminately exploited to trim clock period.