The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms

Authors:
Phillip B. Gibbons;Yossi Matias;Vijaya Ramachandran
Affiliations:
-;-;-
Venue:
SIAM Journal on Computing
Year:
1999

Citing 0
Cited 9

Contention in shared memory algorithms

Journal of the ACM (JACM)
Communication-processor tradeoffs in limited resources PRAM

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Task Allocation on a Network of Processors

IEEE Transactions on Computers
Parallel Algorithm Design with Coarse-Grained Synchronization

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Designing Practical Efficient Algorithms for Symmetric Multiprocessors

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Emulations between QSM, BSP and LogP: a framework for general-purpose parallel algorithm design

Journal of Parallel and Distributed Computing
Data structures in the multicore age

Communications of the ACM
Modeling and predicting performance of high performance computing applications on hardware accelerators

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper introduces the queue-read queue-write ({\sc qrqw}) parallel random access machine ({\sc pram}) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The {\sc qrqw pram} model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied {\sc crcw pram} or {\sc erew pram} models: the {\sc crcw} model does not adequately penalize algorithms with high contention to shared-memory locations, while the {\sc erew} model is too strict in its insistence on zero contention at each step.The {\sc qrqw pram} is strictly more powerful than the {\sc erew pram}. This paper shows a separation of $\sqrt{\log n}$ between the two models, and presents faster and more efficient {\sc qrqw} algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a work-preserving emulation of the {\sc qrqw pram} with only logarithmic slowdown on Valiant's {\sc bsp} model, and hence on hypercube-type noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the best-known emulation result for the {\sc erew pram}, and considerably improves upon the best-known efficient emulation for the {\sc crcw pram} on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.