Numerical recipes: the art of scientific computing
Numerical recipes: the art of scientific computing
Scheduling Multiprocessor Tasks to Minimize Schedule Length
IEEE Transactions on Computers
Partitioning and scheduling parallel programs for execution on multiprocessors
Partitioning and scheduling parallel programs for execution on multiprocessors
Complexity of scheduling parallel task systems
SIAM Journal on Discrete Mathematics
Generalised multiprocessor scheduling using optimal control
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
Compilation of parallel multimedia computations—extending retiming theory and Amdahl's law
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Task scheduling on bus-based networks of workstations
Cluster computing
Scheduling malleable tasks with precedence constraints
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
A workload-aware mapping approach for data-parallel programs
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Scheduling malleable tasks with precedence constraints
Journal of Computer and System Sciences
Hi-index | 0.00 |
This paper considerably extends the multiprocessor scheduling techniques in [1], and applies it to matrix arithmetic compilation. In [1] we presented several new results in the theory of homogeneous multiprocessor scheduling. A directed acyclic graph (DAG) of tasks is to be scheduled. Tasks are assumed to be parallelizable - as more processors are applied to a task, the time taken to compute it decreases, yielding some speedup. Because of communication, synchronization, and task scheduling overhead, this speedup increases less than linearly with the number of processors applied. The optimal scheduling problem is to determine the number of processors assigned to each task, and task sequencing, to minimise the finishing time.Using optimal control theory, in the special case where the speedup function of each task is pα, where p is the amount of processing power applied to the task, a closed form solution for task graphs formed from parallel and series connections was derived [1]. This paper extends these results for arbitrary DAGS. The optimality conditions impose nonlinear constraints on the flow of processing power from predecessors to successors, and on the finishing times of siblings. This paper presents a fast algorithm for determining and solving these nonlinear equations. The algorithm utilizes the structure of the finishing time equations to efficiently run a conjugate gradient minimization, leading to the optimal solution. The algorithm has been tested on a variety of DAGs. The results presented show that it is superior to alternative heuristic approaches.