Loop skewing: the wavefront method revisited
International Journal of Parallel Programming
Compiler algorithms for synchronization
IEEE Transactions on Computers
Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallel processor balance through loop spreading
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Self-scheduling, data synchronization and program transformation for multiprocessor systems
Self-scheduling, data synchronization and program transformation for multiprocessor systems
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Techniques for integrating parallelizing transformations and compiler-based scheduling methods
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
This paper describes loop displacement a mapping that can be used to transform loops and schedule loops for parallel execution. Loop displacement can be applied to hybrid multiply nested loops to achieve the following:Transform sequential loops into parallel loops.Transform and schedule loops to reduce synchronization overhead.Generate schedules that distribute the load evenly among the processors.Generate schedules that reduce idling of processors.Integrate loop transformation and loop scheduling.Loop displacement can be used in place of loop alignment, loop coalescing, and loop skewing. It generalizes loop spreading, a scheduling technique for achieving good load balancing, to multiply nested loops and generates schedules with better load distribution and sometimes less processor idling than those generated by OPTAL a static loop partitioning algorithm. Furthermore, it integrates loop transformation and loop scheduling which provides better results. Achieving the goals of a variety of loop transformation and scheduling strategies through loop displacement can simplify the compiler and allow intelligent techniques that integrate the application of transformations to be developed.