Pipelined Data Parallel Algorithms-II: Design

Authors:
C. T. King;W. H. Chou;L. M. Ni
Affiliations:
-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1990

Citing 18
Cited 14

Spacetime representations of computational structures

Computing
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Multicomputers: Message-Passing Concurrent Computers

Computer
Data dependence and its application to parallel processing

International Journal of Parallel Programming
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
Page table management in local/remote architectures

ICS '88 Proceedings of the 2nd international conference on Supercomputing
On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Semi-automatic process partitioning for parallel computation

International Journal of Parallel Programming
Compiling C* programs for a hypercube multicomputer

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Networks for parallel processors: measurements and prognostications

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
On partitioning and mapping for hypercube computing

International Journal of Parallel Programming
Dynamic Page Migration in Multiprocessors with Distributed Global Memory

IEEE Transactions on Computers
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

IEEE Transactions on Computers
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems

IEEE Transactions on Computers
Configuring parallel programs

BYTE
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Pipelined Data Parallel Algorithms-I: Concept and Modeling

IEEE Transactions on Parallel and Distributed Systems

Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Executing DSP Applications in a Fine-Grained Dataflow Environment

IEEE Transactions on Software Engineering
Scheduling Multiprocessor Tasks with Genetic Algorithms

IEEE Transactions on Parallel and Distributed Systems
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Optimal semi-oblique tiling

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimal tiling for the RNA base pairing problem

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Pipelined Data Parallel Algorithms-I: Concept and Modeling

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Optimal Processor Assignment for a Class of Pipelined Computations

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays

IEEE Transactions on Parallel and Distributed Systems
Collection-Aware Optimum Sequencing of Operations and Closed-Form Solutions for the Distribution of a Divisible Load on Arbitrary Processor Trees

IEEE Transactions on Parallel and Distributed Systems
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A methodology for designing pipelined data-parallel algorithms on multicomputers is studied. The design procedure starts with a sequential algorithm which can be expressed as a nested loop with constant loop-carried dependencies. The procedure's main focus is on partitioning the loop by grouping related iterations together. Grouping is necessary to balance the communication overhead with the available parallelism and to produce pipelined execution patterns, which result in pipelined data-parallel computations. The grouping should satisfy dependence relationships among the iterations and also allow the granularity to be controlled. Various properties of grouping are studied, and methods for generating communication-efficient grouping are given. Given a grouping and an assignment of the groups to the processors, an analytic model is combined with the grouping results to describe the behavior and to estimate the performance of the resultant parallel program. Expressions characterizing the performance are derived.