Generalized Multiprocessor Scheduling and Applications to Matrix Computations

  • Authors:
  • G. N. Srinivasa Prasanna;B. R. Musicus

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper considerably extends the multiprocessor scheduling techniques in [1], [2], and applies it to matrix arithmetic compilation. In [1], [2] we presented several new results in the theory of homogeneous multiprocessor scheduling. A directed acyclic graph (DAG) of tasks is to be scheduled. Tasks are assumed to be parallelizable驴as more processors are applied to a task, the time taken to compute it decreases, yielding some speedup. Because of communication, synchronization, and task scheduling overhead, this speedup increases less than linearly with the number of processors applied. The optimal scheduling problem is to determine the number of processors assigned to each task, and task sequencing, to minimize the finishing time.Using optimal control theory, in the special case where the speedup function of each task is p驴, where p is the amount of processing power applied to the task, a closed form solution for task graphs formed from parallel and series connections was derived [1], [2]. This paper extends these results for arbitrary DAGS. The optimality conditions impose nonlinear constraints on the flow of processing power from predecessors to successors, and on the finishing times of siblings. This paper presents a fast algorithm for determining and solving these nonlinear equations. The algorithm utilizes the structure of the finishing time equations to efficiently run a conjugate gradient minimization, leading to the optimal solution.The algorithm has been tested on a variety of DAGs commonly encountered in matrix arithmetic. The results show that if the p驴 speedup assumption holds, the schedules produced are superior to heuristic approaches. The algorithm has been applied to compiling matrix arithmetic [9], for the MIT Alewife machine, a distributed-shared memory multiprocessor. While matrix arithmetic tasks do not exactly satisfy the p驴 speedup assumptions, the algorithm can be applied as a good heuristic. The results show that the schedules produced by our algorithm are faster than alternative heuristic techniques.