Numerical recipes: the art of scientific computing
Numerical recipes: the art of scientific computing
Scheduling Multiprocessor Tasks to Minimize Schedule Length
IEEE Transactions on Computers
Partitioning and scheduling parallel programs for execution on multiprocessors
Partitioning and scheduling parallel programs for execution on multiprocessors
Complexity of scheduling parallel task systems
SIAM Journal on Discrete Mathematics
Generalised multiprocessor scheduling using optimal control
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A fast static scheduling algorithm for DAGs on an unbounded number of processors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
A Comparison of Heuristics for Scheduling DAGs on Multiprocessors
Proceedings of the 8th International Symposium on Parallel Processing
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part II
Task scheduling for parallel multifrontal methods
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
This paper considerably extends the multiprocessor scheduling techniques in [1], [2], and applies it to matrix arithmetic compilation. In [1], [2] we presented several new results in the theory of homogeneous multiprocessor scheduling. A directed acyclic graph (DAG) of tasks is to be scheduled. Tasks are assumed to be parallelizable驴as more processors are applied to a task, the time taken to compute it decreases, yielding some speedup. Because of communication, synchronization, and task scheduling overhead, this speedup increases less than linearly with the number of processors applied. The optimal scheduling problem is to determine the number of processors assigned to each task, and task sequencing, to minimize the finishing time.Using optimal control theory, in the special case where the speedup function of each task is p驴, where p is the amount of processing power applied to the task, a closed form solution for task graphs formed from parallel and series connections was derived [1], [2]. This paper extends these results for arbitrary DAGS. The optimality conditions impose nonlinear constraints on the flow of processing power from predecessors to successors, and on the finishing times of siblings. This paper presents a fast algorithm for determining and solving these nonlinear equations. The algorithm utilizes the structure of the finishing time equations to efficiently run a conjugate gradient minimization, leading to the optimal solution.The algorithm has been tested on a variety of DAGs commonly encountered in matrix arithmetic. The results show that if the p驴 speedup assumption holds, the schedules produced are superior to heuristic approaches. The algorithm has been applied to compiling matrix arithmetic [9], for the MIT Alewife machine, a distributed-shared memory multiprocessor. While matrix arithmetic tasks do not exactly satisfy the p驴 speedup assumptions, the algorithm can be applied as a good heuristic. The results show that the schedules produced by our algorithm are faster than alternative heuristic techniques.