Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Lock-free data structures
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors
Journal of Parallel and Distributed Computing
Specifying Concurrent Program Modules
ACM Transactions on Programming Languages and Systems (TOPLAS)
Starfire: Extending the SMP Envelope
IEEE Micro
A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap
IEEE Transactions on Computers
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Wait-Free Reference Counting and Memory Management
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
FaCSim: a fast and cycle-accurate architecture simulator for embedded systems
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
On dynamic load balancing on graphics processors
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Practical, Fast and Simple Concurrent FIFO Queues Using Single Word Synchronization Primitives
Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies
Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems
Non-blocking Array-Based Algorithms for Stacks and Queues
ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
On the design and implementation of a shared memory dispatcher for partially clairvoyant schedulers
International Journal of Parallel Programming
On sorting and load balancing on GPUs
ACM SIGARCH Computer Architecture News
NOBLE: non-blocking programming support via lock-free shared abstract data types
ACM SIGARCH Computer Architecture News
LFTHREADS: a lock-free thread library
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency
OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
Wait-free queues with multiple enqueuers and dequeuers
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A lock-free algorithm for concurrent bags
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Progress guarantees when composing lock-free objects
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Toward high-throughput algorithms on many-core architectures
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Allocating memory in a lock-free manner
ESA'05 Proceedings of the 13th annual European conference on Algorithms
A methodology for creating fast wait-free data structures
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Lock-Free parallel algorithms: an experimental study
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
On the implementation of concurrent objects
Dependable and Historic Computing
Compiler support for lightweight context switching
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
An efficient unbounded lock-free queue for multi-core systems
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Understanding the performance of concurrent data structures on graphics processors
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Fast concurrent queues for x86 processors
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable SIMD-parallel memory allocation for many-core machines
The Journal of Supercomputing
Hi-index | 0.00 |
A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric multiprocessor shared memory systems. Experiments on a 64-node SUN Enterprise 10000 — a symmetric multiprocessorsystem — and on a 64-node SGI Origin 2000 — a cache coherent non uniform memory access multiprocessorsystem — indicate that our algorithm considerably outperforms the best of the known alternatives in both multiprocessors in any level of multiprogramming. This work introduces two new, simple algorithmic mechanisms. The first lowers the contention to key variables used by the concurrent enqueue and/or dequeue operations which consequently results in the good performance of the algorithm, the second deals with the pointer recycling problem, an inconsistency problem that all non-blocking algorithms based on the compare-and-swap synchronisation primitive have to address. In our construction we selected to use compare-and-swap since compare-and-swap is an atomic primitive that scales well under contention and either is supported by modern multiprocessors or can be implemented efficiently on them.