Crystal: from functional description to efficient parallel code
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Uniform techniques for loop optimization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Optimal schedules for parallel prefix computation with bounded resources
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Recognizing and Parallelizing Bounded Recurrences
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Detection of Recurrences in Sequential Programs with Loops
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
A programming language
Detection and global optimization of reduction operations for distributed parallel machines
ICS '96 Proceedings of the 10th international conference on Supercomputing
Scheduling reductions on realistic machines
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Speculative parallelization of partial reduction variables
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Scan detection and parallelization in "inherently sequential" nested loop programs
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
In order to detect more parallelism in scientific programs, one may extract a parallelism relative to reductions. This paper presents such a method which schedules programs with explicit computations of reductions. We describe the way the reductions are expressed in our input language (which is in fact the output language of the reductions detector presented in [RF93]). We also give a brief summary of scheduling techniques. In order to simplify the scheduling we suppose that the target parallel computer has an infinite number of processors with infinite fan-in. We show that a schedule computed with this model can be adapted to work on real parallel machines. Then we present a scheduling method based on the algorithms from [Fea92a, Fea92b] which works in presence of reductions. This method is applied on an example. Lastly, we show that side-effects of reductions scheduling are the simplification of the scheduling process and the improvement of the computed schedules.