Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Non-blocking steal-half work queues
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
IEEE Transactions on Parallel and Distributed Systems
A scalable lock-free stack algorithm
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Using elimination to implement scalable and lock-free FIFO queues
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
A dynamic-sized nonblocking work stealing deque
Distributed Computing - Special issue: DISC 04
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors
Communications of the ACM
Scalable producer-consumer pools based on elimination-diffraction trees
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency
OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
quasi-linearizability: relaxed consistency for improved concurrency
OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
Locality-conscious lock-free linked lists
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A lock-free algorithm for concurrent bags
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A case for NUMA-aware contention management on multicore systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
CAFÉ: scalable task pools with adjustable fairness and contention
DISC'11 Proceedings of the 25th international conference on Distributed computing
Hi-index | 0.00 |
We present a highly-scalable non-blocking producer-consumer task pool, designed with a special emphasis on lightweight synchronization and data locality. The core building block of our pool is SALSA, Scalable And Low Synchronization Algorithm for a single-consumer container with task stealing support. Each consumer operates on its own SALSA container, stealing tasks from other containers if necessary. We implement an elegant self-tuning policy for task insertion, which does not push tasks to overloaded SALSA containers, thus decreasing the likelihood of stealing. SALSA manages large chunks of tasks, which improves locality and facilitates stealing. SALSA uses a novel approach for coordination among consumers, without strong atomic operations or memory barriers in the fast path. It invokes only two CAS operations during a chunk steal. Our evaluation demonstrates that a pool built using SALSA containers scales linearly with the number of threads and significantly outperforms other FIFO and non-FIFO alternatives.