Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Operating system data structures for shared memory mimd machines with fetch-and-add
Operating system data structures for shared memory mimd machines with fetch-and-add
Communications of the ACM - Security in the Browser
Flat combining and the synchronization-parallelism tradeoff
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Scalable producer-consumer pools based on elimination-diffraction trees
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Scalable flat-combining based synchronous queues
DISC'10 Proceedings of the 24th international conference on Distributed computing
Proceedings of the ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
We introduce an extremely simple transformation that allows composition of a more scalable concurrent blocking multiset, or bag, from multiple "lanes" of a potentially less scalable underlying multiset. Our design disperses accesses over the various lanes, reducing contention and memory coherence hot spots. Implemented in Java, for instance, we construct a multiset from multiple lanes of java.util.concurrent.SynchronousQueue that yields more than 8 times the aggregate throughput of a single instance of SynchronousQueue when run on a 64-way Sun Niagara-2 system with 16 producer threads and 16 consumer threads. We experimented with various queues from java.util.conconcurrent and found that in general a MultiLane form will outperform its underlying counterpart.