Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Communications of the ACM - Special issue on parallelism
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Multipipeline Networking for Compound Vector Processing
IEEE Transactions on Computers
Concurrent I/O system for the hypercube multiprocessor
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
VMMP: A Practical Tool for the Development of Portable and Efficient Programs for Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Experiences with optimizing two stream-based applications for cluster execution
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
A new style of efficient parallel algorithms on distributed-memory multiprocessors is introduced, which exploits parallelism through pipelined parallel computation, or large-grain pipelining. By using macro-pipelining between nodes in the system, large-grain pipelining regulates the flows of data in the multiprocessor so that the degree of overlapping can be maximized and the effect of communication overhead can be minimized. To model pipelined parallel computations, an analytic model is presented, which takes into account both underlying architecture and algorithm behavior. The resultant model is accurate enough to not only predict the performance of a given algorithm, but also assist in algorithm designs for determining optimal design parameters such as the granularity. Results from experiments performed on a 64-node NCUBE multiprocessor match closely to the predicted performance. A systematic procedure for designing pipelined data parallel algorithms from nested loop programs is described. The impact of the second generation distributed-memory multiprocessors on the pipelined parallel computations is also discussed.