A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Authors:
Philippas Tsigas;Yi Zhang
Affiliations:
Department of Computing Science, Chalmers University of Technology, SE-412 96 Göteborg, Sweden;Department of Computing Science, Chalmers University of Technology, SE-412 96 Göteborg, Sweden
Venue:
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2001

Citing 12
Cited 28

Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Lock-free data structures

Lock-free data structures
Empirical studies of competitve spinning for a shared-memory multiprocessor

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors

Journal of Parallel and Distributed Computing
Specifying Concurrent Program Modules

ACM Transactions on Programming Languages and Systems (TOPLAS)
Starfire: Extending the SMP Envelope

IEEE Micro
A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap

IEEE Transactions on Computers
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems

IEEE Transactions on Parallel and Distributed Systems

Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies

WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Wait-Free Reference Counting and Memory Management

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols

Journal of Parallel and Distributed Computing
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
FaCSim: a fast and cycle-accurate architecture simulator for embedded systems

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
On dynamic load balancing on graphics processors

Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Practical, Fast and Simple Concurrent FIFO Queues Using Single Word Synchronization Primitives

Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies
An asynchronous nonblocking coordination and synchronization protocol for a parallel robotic control kernel

Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems
Non-blocking Array-Based Algorithms for Stacks and Queues

ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
On the design and implementation of a shared memory dispatcher for partially clairvoyant schedulers

International Journal of Parallel Programming
On sorting and load balancing on GPUs

ACM SIGARCH Computer Architecture News
NOBLE: non-blocking programming support via lock-free shared abstract data types

ACM SIGARCH Computer Architecture News
LFTHREADS: a lock-free thread library

OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
The baskets queue

OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency

OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
Wait-free queues with multiple enqueuers and dequeuers

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A lock-free algorithm for concurrent bags

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Progress guarantees when composing lock-free objects

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Toward high-throughput algorithms on many-core architectures

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Allocating memory in a lock-free manner

ESA'05 Proceedings of the 13th annual European conference on Algorithms
A methodology for creating fast wait-free data structures

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Lock-Free parallel algorithms: an experimental study

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
On the implementation of concurrent objects

Dependable and Historic Computing
Compiler support for lightweight context switching

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
An efficient unbounded lock-free queue for multi-core systems

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Understanding the performance of concurrent data structures on graphics processors

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Fast concurrent queues for x86 processors

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable SIMD-parallel memory allocation for many-core machines

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric multiprocessor shared memory systems. Experiments on a 64-node SUN Enterprise 10000 — a symmetric multiprocessorsystem — and on a 64-node SGI Origin 2000 — a cache coherent non uniform memory access multiprocessorsystem — indicate that our algorithm considerably outperforms the best of the known alternatives in both multiprocessors in any level of multiprogramming. This work introduces two new, simple algorithmic mechanisms. The first lowers the contention to key variables used by the concurrent enqueue and/or dequeue operations which consequently results in the good performance of the algorithm, the second deals with the pointer recycling problem, an inconsistency problem that all non-blocking algorithms based on the compare-and-swap synchronisation primitive have to address. In our construction we selected to use compare-and-swap since compare-and-swap is an atomic primitive that scales well under contention and either is supported by modern multiprocessors or can be implemented efficiently on them.