A two-stage hardware scheduler combining greedy and optimal scheduling

Authors:
Raymond R. Hoare;Zhu Ding;Alex K. Jones
Affiliations:
Concurrent EDA, LLC., 357 N. Craig Street, Pittsburgh, PA 15213, USA;Union Switch and Signal, Pittsburgh, PA 15219, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15261, USA and Department of Computer Science, University of Pittsburgh, 3700 O'Hara ...
Venue:
Journal of Parallel and Distributed Computing
Year:
2008

Citing 25
Cited 0

Efficient algorithms for finding maximum matching in graphs

ACM Computing Surveys (CSUR)
The iPSC/2 direct-connect communications technology

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Dynamic reconfiguration of optically interconnected networks with time-division multiplexing

Journal of Parallel and Distributed Computing
Scheduling algorithms for input-queued cell switches

Scheduling algorithms for input-queued cell switches
Scheduling nonuniform traffic in a packet-switching system with small propagation delay

IEEE/ACM Transactions on Networking (TON)
The iSLIP scheduling algorithm for input-queued switches

IEEE/ACM Transactions on Networking (TON)
On the distributed complexity of computing maximal matchings

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Matrices Associated With the Hitchcock Problem

Journal of the ACM (JACM)
An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs

Journal of the ACM (JACM)
Matching is as easy as matrix inversion

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Scheduling of unstructured communication on the Intel iPSC/860

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Tiny Tera: A Packet Switch Core

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Algorithms for Switch-Scheduling in the Multimedia Router for LANs

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Modeling the Communication Performance of the IBM SP2

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Modeling the Communication Behavior of the Intel Paragon

MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Switch Scheduling in the Multimedia Router (MMR)

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
High-level Architectural Simulation of the Torus Routing Chip

IVC '97 Proceedings of the 1997 IEEE International Verilog HDL Conference (IVC '97)
Distributed algorithm for approximating the maximum matching

Discrete Applied Mathematics
Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A framework for the design, synthesis and cycle-accurate simulation of multiprocessor networks

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Greedy scheduling heuristics provide a low complexity and scalable albeit particularly sub-optimal strategy for hardware-based crossbar schedulers. In contrast, the maximum matching algorithm for Bipartite graphs can be used to provide optimal scheduling for crossbar-based interconnection networks with a significant complexity and scalability cost. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By leveraging the inherent parallelism available in custom hardware design, we reformulate maximum matching in terms of Boolean operations rather than matrix computations and introduce three maximum matching implementations in hardware. Specifically, we examine a Pure Logic Scheduler with three dimensions of parallelism, a Matrix Scheduler with two dimensions of parallelism and a Vector Scheduler with one dimension of parallelism. These designs reduce the algorithmic complexity for an NxN network from O(N^3) to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, by starting with our hardware-based greedy strategy to generate an initial schedule, our simulation results show that the maximum matching scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for crossbars, ranging from 8x8 to 256x256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024x1024 the scheduling can be completed in approximately 10 @ms with current technology and could reach under 90 ns with future technologies.