Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Authors:
Hagit Attiya;Rachid Guerraoui;Danny Hendler;Petr Kuznetsov;Maged M. Michael;Martin Vechev
Affiliations:
Technion, Haifa, Israel;EPFL, Lausanne, Switzerland;Ben-Gurion University, Beersheba, Israel;TU Berlin/Deutsche Telekom Labs, Berlin, Germany;IBM T. J. Watson Research Center, Yorktown Heights, USA;IBM T. J. Watson Research Center, Yorktown Heights, USA
Venue:
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Year:
2011

Citing 39
Cited 25

The mutual exclusion problem: partII—statement and solutions

Journal of the ACM (JACM)
A fast mutual exclusion algorithm

ACM Transactions on Computer Systems (TOCS)
Commutativity-Based Concurrency Control for Abstract Data Types

IEEE Transactions on Computers
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The formal semantics of programming languages: an introduction

The formal semantics of programming languages: an introduction
Bounds on shared memory for mutual exclusion

Information and Computation
Concurrent counting

Journal of Computer and System Sciences
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
On the space complexity of randomized synchronization

Journal of the ACM (JACM)
Specifying Concurrent Program Modules

ACM Transactions on Programming Languages and Systems (TOPLAS)
Solution of a problem in concurrent programming control

Communications of the ACM
Computing in totally anonymous asynchronous shared memory systems

Information and Computation
Non-blocking steal-half work queues

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Time and Space Lower Bounds for Nonblocking Implementations

SIAM Journal on Computing
Shared Memory Consistency Models: A Tutorial

Computer
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Java: Memory Consistency and Process Coordination

DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Bounds for Mutual Exclusion with only Processor Consistency

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Compilation techniques for explicitly parallel programs

Compilation techniques for explicitly parallel programs
Limitations and capabilities of weak memory consistency systems

Limitations and capabilities of weak memory consistency systems
Lower bounds for adaptive collect and related objects

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Dynamic circular work-stealing deque

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
A dynamic-sized nonblocking work stealing deque

Distributed Computing - Special issue: DISC 04
On the inherent weakness of conditional primitives

Distributed Computing - Special issue: PODC 04
A theory of memory models

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reordering constraints for pthread-style locks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Time lower bounds for implementations of multi-writer snapshots

Journal of the ACM (JACM)
The semantics of x86-CC multiprocessor machine code

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Better x86 Memory Model: x86-TSO

TPHOLs '09 Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
Line-up: a complete and automatic linearizability checker

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Automatic inference of memory fences

Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Obstruction-Free step complexity: lock-free DCAS as an example

DISC'05 Proceedings of the 19th international conference on Distributed Computing

Verification of semantic commutativity conditions and inverse operations on linked data structures

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Resizable, scalable, concurrent hash tables via relativistic programming

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A verification-based approach to memory fence insertion in relaxed memory systems

Proceedings of the 18th international SPIN conference on Model checking software
Sub-logarithmic test-and-set against aweak adversary

DISC'11 Proceedings of the 25th international conference on Distributed computing
On the cost of concurrency in transactional memory

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Dynamic synthesis for relaxed memory models

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
On the cost of composing shared-memory algorithms

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Execution privatization for scheduler-oblivious concurrent programs

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Fast asymmetric thread synchronization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Performance, scalability, and semantics of concurrent FIFO queues

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Beyond expert-only parallel programming?

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
A case for relativistic programming

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Quantitative relaxation of concurrent data structures

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast RMWs for TSO: semantics and implementation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A scalable lock manager for multicores

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation

Proceedings of the ACM International Conference on Computing Frontiers
Nonblocking algorithms and scalable multicore programming

Communications of the ACM
An O(1)-barriers optimal RMRs mutual exclusion algorithm: extended abstract

Proceedings of the 2013 ACM symposium on Principles of distributed computing
Brief announcement: an asymmetric flat-combining based queue algorithm

Proceedings of the 2013 ACM symposium on Principles of distributed computing
Nonblocking Algorithms and Scalable Multicore Programming

Queue - Concurrency
Deterministic scale-free pipeline parallelism with hyperqueues

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Freeze after writing: quasi-deterministic parallel programming with LVars

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

Building correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trial-and-error process ad-hoc, time consuming and error-prone, but it often leaves designers pondering the question of: is it inherently impossible to eliminate certain synchronization, or is it that I was unable to eliminate it on this attempt and I should keep trying? In this paper we respond to this question. We prove that it is impossible to build concurrent implementations of classic and ubiquitous specifications such as sets, queues, stacks, mutual exclusion and read-modify-write operations, that completely eliminate the use of expensive synchronization. We prove that one cannot avoid the use of either: i) read-after-write (RAW), where a write to shared variable A is followed by a read to a different shared variable B without a write to B in between, or ii) atomic write-after-read (AWAR), where an atomic operation reads and then writes to shared locations. Unfortunately, enforcing RAW or AWAR is expensive on all current mainstream processors. To enforce RAW, memory ordering--also called fence or barrier--instructions must be used. To enforce AWAR, atomic instructions such as compare-and-swap are required. However, these instructions are typically substantially slower than regular instructions. Although algorithm designers frequently struggle to avoid RAW and AWAR, their attempts are often futile. Our result characterizes the cases where avoiding RAW and AWAR is impossible. On the flip side, our result can be used to guide designers towards new algorithms where RAW and AWAR can be eliminated.