The mutual exclusion problem: partII—statement and solutions
Journal of the ACM (JACM)
A fast mutual exclusion algorithm
ACM Transactions on Computer Systems (TOCS)
Commutativity-Based Concurrency Control for Abstract Data Types
IEEE Transactions on Computers
Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
The formal semantics of programming languages: an introduction
The formal semantics of programming languages: an introduction
Bounds on shared memory for mutual exclusion
Information and Computation
Journal of Computer and System Sciences
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
On the space complexity of randomized synchronization
Journal of the ACM (JACM)
Specifying Concurrent Program Modules
ACM Transactions on Programming Languages and Systems (TOPLAS)
Solution of a problem in concurrent programming control
Communications of the ACM
Computing in totally anonymous asynchronous shared memory systems
Information and Computation
Non-blocking steal-half work queues
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Time and Space Lower Bounds for Nonblocking Implementations
SIAM Journal on Computing
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Java: Memory Consistency and Process Coordination
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Bounds for Mutual Exclusion with only Processor Consistency
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Compilation techniques for explicitly parallel programs
Compilation techniques for explicitly parallel programs
Limitations and capabilities of weak memory consistency systems
Limitations and capabilities of weak memory consistency systems
Lower bounds for adaptive collect and related objects
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Dynamic circular work-stealing deque
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
A dynamic-sized nonblocking work stealing deque
Distributed Computing - Special issue: DISC 04
On the inherent weakness of conditional primitives
Distributed Computing - Special issue: PODC 04
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reordering constraints for pthread-style locks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Time lower bounds for implementations of multi-writer snapshots
Journal of the ACM (JACM)
The semantics of x86-CC multiprocessor machine code
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Better x86 Memory Model: x86-TSO
TPHOLs '09 Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics
The Art of Multiprocessor Programming
The Art of Multiprocessor Programming
Line-up: a complete and automatic linearizability checker
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Automatic inference of memory fences
Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Obstruction-Free step complexity: lock-free DCAS as an example
DISC'05 Proceedings of the 19th international conference on Distributed Computing
Verification of semantic commutativity conditions and inverse operations on linked data structures
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Resizable, scalable, concurrent hash tables via relativistic programming
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A verification-based approach to memory fence insertion in relaxed memory systems
Proceedings of the 18th international SPIN conference on Model checking software
Sub-logarithmic test-and-set against aweak adversary
DISC'11 Proceedings of the 25th international conference on Distributed computing
On the cost of concurrency in transactional memory
OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Dynamic synthesis for relaxed memory models
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
On the cost of composing shared-memory algorithms
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Execution privatization for scheduler-oblivious concurrent programs
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Fast asymmetric thread synchronization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Performance, scalability, and semantics of concurrent FIFO queues
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Beyond expert-only parallel programming?
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
A case for relativistic programming
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Quantitative relaxation of concurrent data structures
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast RMWs for TSO: semantics and implementation
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A scalable lock manager for multicores
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the ACM International Conference on Computing Frontiers
Nonblocking algorithms and scalable multicore programming
Communications of the ACM
An O(1)-barriers optimal RMRs mutual exclusion algorithm: extended abstract
Proceedings of the 2013 ACM symposium on Principles of distributed computing
Brief announcement: an asymmetric flat-combining based queue algorithm
Proceedings of the 2013 ACM symposium on Principles of distributed computing
Nonblocking Algorithms and Scalable Multicore Programming
Queue - Concurrency
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Freeze after writing: quasi-deterministic parallel programming with LVars
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Fence-free work stealing on bounded TSO processors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.02 |
Building correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trial-and-error process ad-hoc, time consuming and error-prone, but it often leaves designers pondering the question of: is it inherently impossible to eliminate certain synchronization, or is it that I was unable to eliminate it on this attempt and I should keep trying? In this paper we respond to this question. We prove that it is impossible to build concurrent implementations of classic and ubiquitous specifications such as sets, queues, stacks, mutual exclusion and read-modify-write operations, that completely eliminate the use of expensive synchronization. We prove that one cannot avoid the use of either: i) read-after-write (RAW), where a write to shared variable A is followed by a read to a different shared variable B without a write to B in between, or ii) atomic write-after-read (AWAR), where an atomic operation reads and then writes to shared locations. Unfortunately, enforcing RAW or AWAR is expensive on all current mainstream processors. To enforce RAW, memory ordering--also called fence or barrier--instructions must be used. To enforce AWAR, atomic instructions such as compare-and-swap are required. However, these instructions are typically substantially slower than regular instructions. Although algorithm designers frequently struggle to avoid RAW and AWAR, their attempts are often futile. Our result characterizes the cases where avoiding RAW and AWAR is impossible. On the flip side, our result can be used to guide designers towards new algorithms where RAW and AWAR can be eliminated.