SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools

Authors:
Elad Gidron;Idit Keidar;Dmitri Perelman;Yonathan Perez
Affiliations:
Technion, Haifa, Israel;Technion, Haifa, Israel;Technion, Haifa, Israel;Technion, Haifa, Israel
Venue:
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2012

Citing 20
Cited 0

Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
Non-blocking steal-half work queues

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects

IEEE Transactions on Parallel and Distributed Systems
A scalable lock-free stack algorithm

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Using elimination to implement scalable and lock-free FIFO queues

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
A dynamic-sized nonblocking work stealing deque

Distributed Computing - Special issue: DISC 04
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
The baskets queue

OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors

Communications of the ACM
Scalable producer-consumer pools based on elimination-diffraction trees

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency

OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
quasi-linearizability: relaxed consistency for improved concurrency

OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
Locality-conscious lock-free linked lists

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Flat-combining NUMA locks

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Location-based memory fences

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A lock-free algorithm for concurrent bags

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
CAFÉ: scalable task pools with adjustable fairness and contention

DISC'11 Proceedings of the 25th international conference on Distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a highly-scalable non-blocking producer-consumer task pool, designed with a special emphasis on lightweight synchronization and data locality. The core building block of our pool is SALSA, Scalable And Low Synchronization Algorithm for a single-consumer container with task stealing support. Each consumer operates on its own SALSA container, stealing tasks from other containers if necessary. We implement an elegant self-tuning policy for task insertion, which does not push tasks to overloaded SALSA containers, thus decreasing the likelihood of stealing. SALSA manages large chunks of tasks, which improves locality and facilitates stealing. SALSA uses a novel approach for coordination among consumers, without strong atomic operations or memory barriers in the fast path. It invokes only two CAS operations during a chunk steal. Our evaluation demonstrates that a pool built using SALSA containers scales linearly with the number of threads and significantly outperforms other FIFO and non-FIFO alternatives.