Contention in shared memory algorithms
Journal of the ACM (JACM)
Communication-processor tradeoffs in limited resources PRAM
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Task Allocation on a Network of Processors
IEEE Transactions on Computers
Parallel Algorithm Design with Coarse-Grained Synchronization
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Emulations between QSM, BSP and LogP: a framework for general-purpose parallel algorithm design
Journal of Parallel and Distributed Computing
Data structures in the multicore age
Communications of the ACM
International Journal of High Performance Computing Applications
Hi-index | 0.02 |
This paper introduces the queue-read queue-write ({\sc qrqw}) parallel random access machine ({\sc pram}) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The {\sc qrqw pram} model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied {\sc crcw pram} or {\sc erew pram} models: the {\sc crcw} model does not adequately penalize algorithms with high contention to shared-memory locations, while the {\sc erew} model is too strict in its insistence on zero contention at each step.The {\sc qrqw pram} is strictly more powerful than the {\sc erew pram}. This paper shows a separation of $\sqrt{\log n}$ between the two models, and presents faster and more efficient {\sc qrqw} algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a work-preserving emulation of the {\sc qrqw pram} with only logarithmic slowdown on Valiant's {\sc bsp} model, and hence on hypercube-type noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the best-known emulation result for the {\sc erew pram}, and considerably improves upon the best-known efficient emulation for the {\sc crcw pram} on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.