Flat combining and the synchronization-parallelism tradeoff

Authors:
Danny Hendler;Itai Incze;Nir Shavit;Moran Tzafrir
Affiliations:
Ben-Gurion University, Beer-Sheva, Israel;Tel-Aviv University, Tel-Aviv, Israel;Tel-Aviv University, Tel-Aviv, Israel;Tel-Aviv University, Tel-Aviv, Israel
Venue:
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Year:
2010

Citing 17
Cited 38

The pairing heap: a new form of self-adjusting heap

Algorithmica
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Skip lists: a probabilistic alternative to balanced trees

Communications of the ACM
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
An efficient algorithm for concurrent priority queue heaps

Information Processing Letters
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
On optimistic methods for concurrency control

ACM Transactions on Database Systems (TODS)
Hoard: a scalable memory allocator for multithreaded applications

ACM SIGPLAN Notices
Combining funnels: a dynamic approach to software combining

Journal of Parallel and Distributed Computing
Scalable queue-based spin locks with timeout

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Skiplist-Based Concurrent Priority Queues

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
A scalable lock-free stack algorithm

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Predictive log-synchronization

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
The baskets queue

OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems

The inherent complexity of transactional memory and what to do about it

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Scalable flat-combining based synchronous queues

DISC'10 Proceedings of the 24th international conference on Distributed computing
Invited paper: the inherent complexity of transactional memory and what to do about it

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Flat-combining NUMA locks

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: multilane - a concurrent blocking multiset

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A highly-efficient wait-free universal construction

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Smart data structures: an online machine learning approach to multicore data structures

Proceedings of the 8th ACM international conference on Autonomic computing
Delegated isolation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Fast and scalable rendezvousing

DISC'11 Proceedings of the 25th international conference on Distributed computing
A methodology for creating fast wait-free data structures

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
WTTM 2011: the third workshop on the theory of transactional memory

ACM SIGACT News
On the cost of concurrency in transactional memory

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
A dynamic elimination-combining stack algorithm

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Incorrect systems: it's not the problem, it's the solution

Proceedings of the 49th Annual Design Automation Conference
Reagents: expressing and composing fine-grained concurrency

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Delegation and nesting in best-effort hardware transactional memory

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Fast asymmetric thread synchronization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Performance, scalability, and semantics of concurrent FIFO queues

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
How FIFO is your concurrent FIFO queue?

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Quantitative relaxation of concurrent data structures

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast concurrent queues for x86 processors

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation

Proceedings of the ACM International Conference on Computing Frontiers
Brief announcement: an asymmetric flat-combining based queue algorithm

Proceedings of the 2013 ACM symposium on Principles of distributed computing
Reducing contention through priority updates

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Unifying refinement and hoare-style reasoning in a logic for higher-order concurrency

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Turning nondeterminism into parallelism

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Automatic linearizability proofs of concurrent objects with cooperating updates

CAV'13 Proceedings of the 25th international conference on Computer Aided Verification
Aspect-Oriented linearizability proofs

CONCUR'13 Proceedings of the 24th international conference on Concurrency Theory
Lightweight contention management for efficient compare-and-swap operations

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Leveraging hardware message passing for efficient thread synchronization

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably good scheduling for parallel programs that use data structures through implicit batching

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
FaRM: fast remote memory

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional data structure designs, whether lock-based or lock-free, provide parallelism via fine grained synchronization among threads. We introduce a new synchronization paradigm based on coarse locking, which we call flat combining. The cost of synchronization in flat combining is so low, that having a single thread holding a lock perform the combined access requests of all others, delivers, up to a certain non-negligible concurrency level, better performance than the most effective parallel finely synchronized implementations. We use flat-combining to devise, among other structures, new linearizable stack, queue, and priority queue algorithms that greatly outperform all prior algorithms.