A hierarchical CLH queue lock

Authors:
Victor Luchangco;Dan Nussbaum;Nir Shavit
Affiliations:
Sun Microsystems Laboratories;Sun Microsystems Laboratories;Sun Microsystems Laboratories
Venue:
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Year:
2006

Citing 7
Cited 9

Adaptive backoff synchronization techniques

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Scalable queue-based spin locks with timeout

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Hierarchical Backoff Locks for Nonuniform Communication Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro

SNZI: scalable NonZero indicators

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Flat-combining NUMA locks

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Runtime efficient event scheduling in multi-threaded network simulation

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
NUMA-aware reader-writer locks

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Nonblocking Algorithms and Scalable Multicore Programming

Queue - Concurrency
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern multiprocessor architectures such as CC-NUMA machines or CMPs have nonuniform communication architectures that render programs sensitive to memory access locality. A recent paper by Radović and Hagersten shows that performance gains can be obtained by developing general-purpose mutual-exclusion locks that encourage threads with high mutual memory locality to acquire the lock consecutively, thus reducing the overall cost due to cache misses. Radović and Hagersten present the first such hierarchical locks. Unfortunately, their locks are backoff locks, which are known to incur higher cache miss rates than queue-based locks, suffer from various fundamental fairness issues, and are hard to tune so as to maximize locality of lock accesses. Extending queue-locking algorithms to be hierarchical requires that requests from threads with high mutual memory locality be consecutive in the queue. Until now, it was not clear that one could design such locks because collecting requests locally and moving them into a global queue seemingly requires a level of coordination whose cost would defeat the very purpose of hierarchical locking. This paper presents a hierarchical version of the Craig, Landin, and Hagersten CLH queue lock, which we call the HCLH queue lock. In this algorithm, threads build implicit local queues of waiting threads, splicing them into a global queue at the cost of only a single CAS operation. In a set of microbenchmarks run on a large scale multiprocessor machine and a state-of-the-art multi-threaded multi-core chip, the HLCH algorithm exhibits better performance and significantly better fairness than the hierarchical backoff locks of Radović and Hagersten.