Flat-combining NUMA locks

Authors:
Dave Dice;Virendra J. Marathe;Nir Shavit
Affiliations:
Oracle Labs, Burlington, MA, USA;Oracle Labs, Burlington, MA, USA;Oracle Labs, Burlington, MA, USA
Venue:
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Year:
2011

Citing 8
Cited 4

Renaming in an asynchronous environment

Journal of the ACM (JACM)
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Hierarchical Backoff Locks for Nonuniform Communication Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A hierarchical CLH queue lock

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
NUMA-aware reader-writer locks

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore machines are growing in size, and accordingly shifting from simple bus-based designs to NUMA and CCNUMA architectures. With this shift, the need for scalable hierarchical locking algorithms is becoming crucial to performance. This paper presents a novel scalable hierarchical queue-lock algorithm based on the flat combining synchronization paradigm. At the core of the new algorithm is a scheme for building local queues of waiting threads in a highly efficient manner, and then merging them globally, all with little interconnect traffic and virtually no costly synchronization operations in the common case. In empirical testing on an Oracle SPARC Enterprise T5440 Server, a 256-way CC-NUMA machine, our new flat-combining hierarchical lock significantly outperforms all classic locking algorithms, and at high concurrency levels, provides up to a factor of two improvement over HCLH, the most efficient known hierarchical locking algorithm.