Reducing contention through priority updates

Authors:
Julian Shun;Guy E. Blelloch;Jeremy T. Fineman;Phillip B. Gibbons
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Georgetown University, Washington, D.C., USA;Intel Labs - Pittsburgh, Pittsburgh, PA, USA
Venue:
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2013

Citing 25
Cited 0

Commutativity-Based Concurrency Control for Abstract Data Types

IEEE Transactions on Computers
Making asynchronous parallelism safe for the world

POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Synchronization without contention

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scalable reader-writer synchronization for shared-memory multiprocessors

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
An introduction to parallel algorithms

An introduction to parallel algorithms
Diffracting trees

ACM Transactions on Computer Systems (TOCS)
Reactive diffracting trees

Journal of Parallel and Distributed Computing
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining funnels: a dynamic approach to software combining

Journal of Parallel and Distributed Computing
Dynamic decentralized cache schemes for mimd parallel processors

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Active memory operations

Proceedings of the 21st annual international conference on Supercomputing
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers
Computational Geometry: Algorithms and Applications

Computational Geometry: Algorithms and Applications
Combinable memory-block transactions

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Managing contention for shared resources on multicore processors

Communications of the ACM
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Internally deterministic parallel algorithms can be fast

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Brief announcement: the problem based benchmark suite

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Parallel and I/O efficient set covering algorithms

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Greedy sequential maximal independent set and matching are parallel on average

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory contention can be a serious performance bottleneck in concurrent programs on shared-memory multicore architectures. Having all threads write to a small set of shared locations, for example, can lead to orders of magnitude loss in performance relative to all threads writing to distinct locations, or even relative to a single thread doing all the writes. Shared write access, however, can be very useful in parallel algorithms, concurrent data structures, and protocols for communicating among threads. We study the "priority update" operation as a useful primitive for limiting write contention in parallel and concurrent programs. A priority update takes as arguments a memory location, a new value, and a comparison function p that enforces a partial order over values. The operation atomically compares the new value with the current value in the memory location, and writes the new value only if it has higher priority according to p. On the implementation side, we show that if implemented appropriately, priority updates greatly reduce memory contention over standard writes or other atomic operations when locations have a high degree of sharing. This is shown both experimentally and theoretically. On the application side, we describe several uses of priority updates for implementing parallel algorithms and concurrent data structures, often in a way that is deterministic, guarantees progress, and avoids serial bottlenecks. We present experiments showing that a variety of such algorithms and data structures perform well under high degrees of sharing. Given the results, we believe that the priority update operation serves as a useful parallel primitive and good programming abstraction as (1) the user largely need not worry about the degree of sharing, (2) it can be used to avoid non-determinism since, in the common case when p is a total order, priority updates commute, and (3) it has many applications to programs using shared data.