Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors
Journal of Parallel and Distributed Computing
Concurrent reading and writing
Communications of the ACM
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap
IEEE Transactions on Computers
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Critical Sections and Producer/Consumer Queues in Weak Memory Systems
ISPAN '97 Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
IEEE Transactions on Parallel and Distributed Systems
Using elimination to implement scalable and lock-free FIFO queues
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Intel threading building blocks
Intel threading building blocks
Porting decision tree algorithms to multicore using fastflow
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Toward high-throughput algorithms on many-core architectures
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Targeting distributed systems in fastflow
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Hi-index | 0.00 |
The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications.