Fast concurrent queues for x86 processors

Authors:
Adam Morrison;Yehuda Afek
Affiliations:
Tel Aviv University, Tel Aviv, Israel;Tel Aviv University, Tel Aviv, Israel
Venue:
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2013

Citing 15
Cited 2

Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects

IEEE Transactions on Parallel and Distributed Systems
Using elimination to implement scalable and lock-free FIFO queues

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Formal Verification of an Array-Based Nonblocking Queue

ICECCS '05 Proceedings of the 10th IEEE International Conference on Engineering of Complex Computer Systems
Combinable memory-block transactions

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Non-blocking Array-Based Algorithms for Stacks and Queues

ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
The baskets queue

OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors

Communications of the ACM
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Wait-free queues with multiple enqueuers and dequeuers

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A highly-efficient wait-free universal construction

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Can lock-free and combining techniques co-exist?: a novel approach on concurrent queue

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Leveraging hardware message passing for efficient thread synchronization

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues. This paper takes a different approach, showing how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct a nonblocking (lock-free) linearizable concurrent FIFO queue which, despite the F&A being a contended hot spot, outperforms combining-based implementations by 1.5x to 2.5x in all concurrency levels on an x86 server with four multicore processors, in both single-processor and multi-processor executions.