A two-stage hardware scheduler combining greedy and optimal scheduling

  • Authors:
  • Raymond R. Hoare;Zhu Ding;Alex K. Jones

  • Affiliations:
  • Concurrent EDA, LLC., 357 N. Craig Street, Pittsburgh, PA 15213, USA;Union Switch and Signal, Pittsburgh, PA 15219, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15261, USA and Department of Computer Science, University of Pittsburgh, 3700 O'Hara ...

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Greedy scheduling heuristics provide a low complexity and scalable albeit particularly sub-optimal strategy for hardware-based crossbar schedulers. In contrast, the maximum matching algorithm for Bipartite graphs can be used to provide optimal scheduling for crossbar-based interconnection networks with a significant complexity and scalability cost. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By leveraging the inherent parallelism available in custom hardware design, we reformulate maximum matching in terms of Boolean operations rather than matrix computations and introduce three maximum matching implementations in hardware. Specifically, we examine a Pure Logic Scheduler with three dimensions of parallelism, a Matrix Scheduler with two dimensions of parallelism and a Vector Scheduler with one dimension of parallelism. These designs reduce the algorithmic complexity for an NxN network from O(N^3) to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, by starting with our hardware-based greedy strategy to generate an initial schedule, our simulation results show that the maximum matching scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for crossbars, ranging from 8x8 to 256x256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024x1024 the scheduling can be completed in approximately 10 @ms with current technology and could reach under 90 ns with future technologies.