Efficient algorithms for finding maximum matching in graphs
ACM Computing Surveys (CSUR)
The iPSC/2 direct-connect communications technology
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Dynamic reconfiguration of optically interconnected networks with time-division multiplexing
Journal of Parallel and Distributed Computing
Scheduling algorithms for input-queued cell switches
Scheduling algorithms for input-queued cell switches
Scheduling nonuniform traffic in a packet-switching system with small propagation delay
IEEE/ACM Transactions on Networking (TON)
The iSLIP scheduling algorithm for input-queued switches
IEEE/ACM Transactions on Networking (TON)
On the distributed complexity of computing maximal matchings
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Matrices Associated With the Hitchcock Problem
Journal of the ACM (JACM)
An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs
Journal of the ACM (JACM)
Matching is as easy as matrix inversion
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Scheduling of unstructured communication on the Intel iPSC/860
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Tiny Tera: A Packet Switch Core
IEEE Micro
Algorithms for Switch-Scheduling in the Multimedia Router for LANs
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Modeling the Communication Performance of the IBM SP2
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Modeling the Communication Behavior of the Intel Paragon
MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Switch Scheduling in the Multimedia Router (MMR)
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
High-level Architectural Simulation of the Torus Routing Chip
IVC '97 Proceedings of the 1997 IEEE International Verilog HDL Conference (IVC '97)
Distributed algorithm for approximating the maximum matching
Discrete Applied Mathematics
Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A framework for the design, synthesis and cycle-accurate simulation of multiprocessor networks
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Hi-index | 0.00 |
Greedy scheduling heuristics provide a low complexity and scalable albeit particularly sub-optimal strategy for hardware-based crossbar schedulers. In contrast, the maximum matching algorithm for Bipartite graphs can be used to provide optimal scheduling for crossbar-based interconnection networks with a significant complexity and scalability cost. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By leveraging the inherent parallelism available in custom hardware design, we reformulate maximum matching in terms of Boolean operations rather than matrix computations and introduce three maximum matching implementations in hardware. Specifically, we examine a Pure Logic Scheduler with three dimensions of parallelism, a Matrix Scheduler with two dimensions of parallelism and a Vector Scheduler with one dimension of parallelism. These designs reduce the algorithmic complexity for an NxN network from O(N^3) to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, by starting with our hardware-based greedy strategy to generate an initial schedule, our simulation results show that the maximum matching scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for crossbars, ranging from 8x8 to 256x256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024x1024 the scheduling can be completed in approximately 10 @ms with current technology and could reach under 90 ns with future technologies.