Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds

Authors:
C. D. Polychronopoulus;U. Banerjee
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 13
Cited 13

HPSm, a high performance restricted data flow architecture having minimal functionality

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Structure of Computers and Computations

Structure of Computers and Computations
Speedup of ordinary programs

Speedup of ordinary programs
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)

Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)

Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)
On program restructuring, scheduling, and communication for parallel processor systems

On program restructuring, scheduling, and communication for parallel processor systems
Measuring the Parallelism Available for Very Long Instruction Word Architectures

IEEE Transactions on Computers
Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing

IEEE Transactions on Computers
Parallelism and Representation Problems in Distributed Systems

IEEE Transactions on Computers
Multiprocessor Scheduling with the Aid of Network Flow Algorithms

IEEE Transactions on Software Engineering
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Exploiting variable grain parallelism at runtime

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
On the combination of hardware and software concurrency extraction methods

ACM SIGMICRO Newsletter
Measuring the scalability of parallel computer systems

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems

IEEE Transactions on Computers
Solving banded systems using a parallel programming language with hierarchically data descriptive features

ICS '91 Proceedings of the 5th international conference on Supercomputing
Stochastic performance models of parallel task systems (extended abstract)

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On the combination of hardware and software concurrency extraction methods

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Cyclic Staggered Scheme: A Loop Allocation Policy for DOACROSS Loops

IEEE Transactions on Computers
Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Load Balancing Highly Irregular Computations with the Adaptive Factoring

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Paper: Assigning dependency graphs onto processor networks

Parallel Computing

Quantified Score

Hi-index	14.99

Visualization

Abstract

The main aim of the paper is to study allocation of processors to, parallel programs executing on a multiprocessor system, and the resulting speedups. First, we consider a parallel program as a sequence of steps where each step consists of a set of parallel operations. General bounds on the speedup on a p- processor system are derived based on this model. Measurements of code parallelism for the, LINPACK numerical package are presented to support the belief that typical numerical programs contain much potential parallelism that can be discovered by a good restructuring compiler. Next, a parallel program is represented as a task graph whose nodes are do across loops (i.e., loops whose iterations can be partially, overlapped). It is shown how processors can be allocated to exploit horizontal and vertical parallelism in such graphs. Two processor allocation heuristic algorithms (WP and PA) are presented. PA is the heart of the WP and is used to obtain efficient processor allocations for a set of independent parallel tasks. WP allocates processors to general task graphs. Finally, a general formula for the speedup of a DO across loop is given that is more accurate than the known formula.