Pipelined data parallel algorithms—concept and modeling

Authors:
C.-T. King;W.-H. Chou;L. M. Ni
Affiliations:
Michigan State Univ., East Lansing;Michigan State Univ., East Lansing;Michigan State Univ., East Lansing
Venue:
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Year:
1988

Citing 7
Cited 2

Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays

IEEE Transactions on Computers
Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Guest Editors' Introduction: Systolic Arrays-From Concept to Implementation

Computer
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Multipipeline Networking for Compound Vector Processing

IEEE Transactions on Computers
Concurrent I/O system for the hypercube multiprocessor

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing

VMMP: A Practical Tool for the Development of Portable and Efficient Programs for Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Experiences with optimizing two stream-based applications for cluster execution

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new style of efficient parallel algorithms on distributed-memory multiprocessors is introduced, which exploits parallelism through pipelined parallel computation, or large-grain pipelining. By using macro-pipelining between nodes in the system, large-grain pipelining regulates the flows of data in the multiprocessor so that the degree of overlapping can be maximized and the effect of communication overhead can be minimized. To model pipelined parallel computations, an analytic model is presented, which takes into account both underlying architecture and algorithm behavior. The resultant model is accurate enough to not only predict the performance of a given algorithm, but also assist in algorithm designs for determining optimal design parameters such as the granularity. Results from experiments performed on a 64-node NCUBE multiprocessor match closely to the predicted performance. A systematic procedure for designing pipelined data parallel algorithms from nested loop programs is described. The impact of the second generation distributed-memory multiprocessors on the pipelined parallel computations is also discussed.