A near-optimal real-time hardware scheduler for large cardinality crossbar switches

Authors:
Raymond R. Hoare;Zhu Ding;Alex K. Jones
Affiliations:
Concurrent EDA, LLC;Union Switch & Signal;University of Pittsburgh
Venue:
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Year:
2006

Citing 9
Cited 2

Efficient algorithms for finding maximum matching in graphs

ACM Computing Surveys (CSUR)
Scheduling nonuniform traffic in a packet-switching system with small propagation delay

IEEE/ACM Transactions on Networking (TON)
On the distributed complexity of computing maximal matchings

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Matrices Associated With the Hitchcock Problem

Journal of the ACM (JACM)
An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs

Journal of the ACM (JACM)
Matching is as easy as matrix inversion

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Distributed algorithm for better approximation of the maximum matching

COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics

Allocator implementations for network-on-chip routers

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Packet chaining: efficient single-cycle allocation for on-chip networks

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The maximum matching algorithm for bipartite graphs can be used to provide optimal scheduling for crossbar based interconnection networks. Unfortunately, maximum matching requires O(N3) time for an N x N communication system, which has limited its application to real-time network scheduling. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By taking advantage of the inherent parallelism available in custom hardware design, we introduce three Maximum Matching implementations in hardware and show how we can trade design complexity for performance. Specifically, we examine a Pure Logic Scheduler with three dimensions of parallelism, a Matrix Scheduler with two dimensions of parallelism and a Vector Scheduler with one dimension of parallelism. These designs reduce the algorithmic time complexity down to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K = 2N -- 1 steps, our simulation results show that the scheduler can achieve 99% of the optimal schedule when K = 9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N = 1024. Using FPGA synthesis results, we show that a greedy schedule for various sized crossbars, ranging from 8 x 8 to 256 x 256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024 x 1024 the scheduling can be completed in approximately 10 μs with current technology and could reach under 90 ns with future technologies.