Scans as Primitive Parallel Operations
IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Parallel Computations on Reconfigurable Meshes
IEEE Transactions on Computers
Horizons of parallel computation
Journal of Parallel and Distributed Computing
Introduction to VLSI Systems
Massively Parallel Computing: Data Distribution and Communication
Proceedings of the First Heinz Nixdorf Symposium on Parallel Architectures and Their Efficient Use
Optimal broadcast on parallel locality models
Journal of Discrete Algorithms
Hi-index | 14.98 |
We study fine-grain computation on the Reconfigurable Ring of Processors $({\cal RRP}),$ a parallel architecture whose processing elements (PEs) are interconnected via a multiline reconfigurable bus, each of whose lines has one-packet width and can be configured, independently of other lines, to establish an arbitrary PE-to-PE connection. We present a "cooperative" message-passing protocol that will, in the presence of suitable implementation technology, endow an ${\cal RRP}$ with message latency that is logarithmic in the number of PEs a message passes over in transit. Our study focuses on the computational consequences of such latency in such an architecture. Our main results prove that: 1) an N-PE ${\cal RRP}$ can execute a sweep up or down an N-leaf complete binary tree in time proportional to log N log log N; 2) a broad range of N-PE architectures, including N-PE ${\cal RRP}{\rm s},$ require time proportional to log N log log N to perform such a sweep.