A bridging model for parallel computation
Communications of the ACM
Towards a single model of efficient computation in real parallel machines
PARLE '91 Proceedings on Parallel architectures and languages Europe : volume I: parallel architectures and algorithms: volume I: parallel architectures and algorithms
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A quantitative comparison of parallel computation models
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Modeling parallel bandwidth: local vs. global restrictions
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The Reconfigurable Ring of Processors: Fine-Grain Tree-Structured Computations
IEEE Transactions on Computers
Real-time emulations of bounded-degree networks
Information Processing Letters - Special issue on parallel models
The Paderborn University BSP (PUB) Library - Design, Implementation and Performance
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Broadcast and Associative Operations on Fat-Trees
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Submachine Locality in the Bulk Synchronous Setting (Extended Abstract)
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
The E-BSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Hi-index | 0.00 |
In this paper matching upper and lower bounds for broadcast on general purpose parallel computation models that exploit network locality are proven. These models try to capture both the general purpose properties of models like the PRAM or BSP on the one hand, and to exploit network locality of special purpose models like meshes, hypercubes, etc., on the other hand. They do so by charging a cost l (|i - j|) for a communication between processors i and j, where l is a suitably chosen latency function.An upper bound T (p) = Σi=0loglog p 2i ċ l(p1/2i) on the runtime of a broadcast on a p processor H-PRAM is given, for an arbitrary latency function l(k).The main contribution of the paper is a matching lower bound, holding for all latency functions in the range from l (k) = Ω (log k/log log k) to l (k) = O (log2 k). This is not a severe restriction since for latency functions l(k) = O(logk/log1+ε log(k)) with arbitrary ε 0, the runtime of the algorithm matches the trivial lower bound Ω(log p) and for l(k) = Θ (log 1+ε k) or l(k) = Θ(kε), the runtime matches the other trivial lower bound Ω(l(p)). Both upper and lower bounds apply for other parallel locality models like Y-PRAM, D-BSP and E-BSP, too.