Parallel processor balance through loop spreading

Authors:
Y. Wu;T. Lewis
Affiliations:
Sequent Computer Systems, Inc, Beaverton, OR;Oregon State University, Corvallis, OR
Venue:
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Year:
1989

Citing 7
Cited 2

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Guide to parallel programming on Sequent computer systems: 2nd edition

Guide to parallel programming on Sequent computer systems: 2nd edition
Multiprocessor Synchronization for Concurrent Loops

IEEE Software
Dependence analysis for subscripted variables and its application to program transformations

Dependence analysis for subscripted variables and its application to program transformations
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
Parallel simplex algorithms and loop spreading

Parallel simplex algorithms and loop spreading

Loop displacement: an approach for transforming and scheduling loops for parallel execution

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

When the number of processors P is less than the number of tasks N in a parallel loop, the loop has to be executed in ⌈N/P⌉ rounds and the last round executes only (N mod P) tasks. In many cases, in the last round all but a few processors are idle, which causes a significant drop in performance. This performance drop becomes more and more detrimental as the number of processors increases. Loop spreading is a technique for restructuring parallel loops so as to balance parallel tasks on multiple processors. A spread loop runs at least as fast as the non-spread loop even when N mod P = 0, and shows no performance drop when N changes. We show how the method keeps the performance of the matrix multiplication and a simplex algorithm from decreasing as the size of input changes.