Scalable queue-based spin locks with timeout

Authors:
Michael L. Scott;William N. Scherer
Affiliations:
Department of Computer Science, University of Rochester, Rochester, NY;Department of Computer Science, University of Rochester, Rochester, NY
Venue:
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Year:
2001

Citing 7
Cited 27

Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture

Non-blocking timeout in scalable queue-based spin locks

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Efficient synchronization for nonuniform communication architectures

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hierarchical Backoff Locks for Nonuniform Communication Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Adaptive and efficient abortable mutual exclusion

Proceedings of the twenty-second annual symposium on Principles of distributed computing
Shared-memory mutual exclusion: major research trends since 1986

Distributed Computing - Papers in celebration of the 20th anniversary of PODC
Scalable synchronous queues

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The java.util.concurrent synchronizer framework

Science of Computer Programming - Special issue: Concurrency and synchronization in Java programs
Reordering constraints for pthread-style locks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient self-tuning spin-locks using competitive analysis

Journal of Systems and Software
Enabling scalability and performance in a large scale CMP environment

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Self-stabilizing philosophers with generic conflicts

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
The semantics of progress in lock-based transactional memory

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Transactional Memory: Glimmer of a Theory

CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Design principles for end-to-end multicore schedulers

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Composite abortable locks

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Architectural Support for Fair Reader-Writer Locking

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Fast local-spin abortable mutual exclusion with bounded space

OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
Preemption adaptivity in time-published queue-based spin locks

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
A hierarchical CLH queue lock

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Abortable reader-writer locks are no more complex than abortable mutex locks

DISC'12 Proceedings of the 26th international conference on Distributed Computing
On deterministic abortable objects

Proceedings of the 2013 ACM symposium on Principles of distributed computing
Location-aware cache management for many-core processors with deep cache hierarchy

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

Queue-based spin locks allow programs with busy-wait synchronization to scale to very large multiprocessors, without fear of starvation or performance-destroying contention. So-called try locks, traditionally based on non-scalable test-and-set locks, allow a process to abandon its attempt to acquire a lock after a given amount of time. The process can then pursue an alternative code path, or yield the processor to some other process.We demonstrate that it is possible to obtain both scalability and bounded waiting, using variants of the queue-based locks of Craig, Landin, and Hagersten, and of Mellor-Crummey and Scott. A process that decides to stop waiting for one of these new locks can ``link itself out of line'' atomically. Single-processor experiments reveal performance penalties of 50--100\% for the CLH and MCS try locks in comparison to their standard versions; this marginal cost decreases with larger numbers of processors.We have also compared our queue-based locks to a traditional \tatas\ lock with exponential backoff and timeout. At modest (non-zero) levels of contention, the queued locks sacrifice cache locality for fairness, resulting in a worst-case 3X performance penalty. At high levels of contention, however, they display a 1.5--2X performance advantage, with significantly more regular timings and significantly higher rates of acquisition prior to timeout.