Performance, scalability, and semantics of concurrent FIFO queues

Authors:
Christoph M. Kirsch;Hannes Payer;Harald Röck;Ana Sokolova
Affiliations:
Department of Computer Sciences, University of Salzburg, Austria;Department of Computer Sciences, University of Salzburg, Austria;Department of Computer Sciences, University of Salzburg, Austria;Department of Computer Sciences, University of Salzburg, Austria
Venue:
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Year:
2012

Citing 16
Cited 3

Random number generators: good ones are hard to find

Communications of the ACM
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Counting networks

Journal of the ACM (JACM)
Balanced allocations (extended abstract)

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
"Balls into Bins" - A Simple and Tight Analysis

RANDOM '98 Proceedings of the Second International Workshop on Randomization and Approximation Techniques in Computer Science
Balanced Allocations: The Heavily Loaded Case

SIAM Journal on Computing
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Best-effort computing: re-thinking parallel software and hardware

Proceedings of the 47th Design Automation Conference
Data structures in the multicore age

Communications of the ACM
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
quasi-linearizability: relaxed consistency for improved concurrency

OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
Scalability versus semantics of concurrent FIFO queues

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing

How FIFO is your concurrent FIFO queue?

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Quantitative relaxation of concurrent data structures

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation

Proceedings of the ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce the notion of a k-FIFO queue which may dequeue elements out of FIFO order up to a constant k≥0. Retrieving the oldest element from the queue may require up to k+1 dequeue operations (bounded lateness), which may return elements not younger than the k+1 oldest elements in the queue (bounded age) or nothing even if there are elements in the queue. A k-FIFO queue is starvation-free for finite k where k+1 is what we call the worst-case semantical deviation (WCSD) of the queue from a regular FIFO queue. The WCSD bounds the actual semantical deviation (ASD) of a k-FIFO queue from a regular FIFO queue when applied to a given workload. Intuitively, the ASD keeps track of the number of dequeue operations necessary to return oldest elements and the age of dequeued elements. We show that a number of existing concurrent algorithms implement k-FIFO queues whose WCSD are determined by configurable constants independent from any workload. We then introduce so-called Scal queues, which implement k-FIFO queues with generally larger, workload-dependent as well as unbounded WCSD. Since ASD cannot be obtained without prohibitive overhead we have developed a tool that computes lower bounds on ASD from time-stamped runs. Our micro- and macrobenchmarks on a state-of-the-art 40-core multiprocessor machine show that Scal queues, as an immediate consequence of their weaker WCSD, outperform and outscale existing implementations at the expense of moderately increased lower bounds on ASD.