Scalable concurrent counting

Authors:
Maurice Herlihy;Beng-Hong Lim;Nir Shavit
Affiliations:
Brown Univ., Providence, RI;Massachusetts Institute of Technology, Cambridge;Tel-Aviv Univ., Tel-Aviv, Israel
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1995

Citing 20
Cited 18

Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A software instruction counter

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Adaptive backoff synchronization techniques

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Introduction to algorithms

Introduction to algorithms
Counting networks and multi-processor coordination

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Process coordination with fetch-and-increment

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Low contention linearizable counting

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Low contention load balancing on large-scale multiprocessors

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Dynamic decentralized cache schemes for mimd parallel processors

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Contention in balancing networks resolved (extended abstract)

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Combining funnels: a new twist on an old tale…

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Scalable concurrent priority queue algorithms

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Priority Queues and Sorting Methods for Parallel Simulation

IEEE Transactions on Software Engineering
Dynamic computation migration in DSM systems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A Combinatorial Characterization of Properties Preserved by Antitokens

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A scalable lock-free stack algorithm

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
The counting pyramid: an adaptive distributed counting scheme

Journal of Parallel and Distributed Computing
Read-modify-write networks

Distributed Computing
Linearizable counting networks

Distributed Computing
A scalable lock-free stack algorithm

Journal of Parallel and Distributed Computing
Supporting increment and decrement operations in balancing networks

STACS'99 Proceedings of the 16th annual conference on Theoretical aspects of computer science
The baskets queue

OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Scalable producer-consumer pools based on elimination-diffraction trees

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Lightweight contention management for efficient compare-and-swap operations

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Leveraging hardware message passing for efficient thread synchronization

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Elimination Trees and the Construction of Pools and Stacks

Theory of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. We investigate the scalability of a variety of counting techniques for large-scale multiprocessors. We compare counting techniques based on: (1) spin locks, (2) message passing, (3) distributed queues, (4) software combining trees, and (5) counting networks. Our comparison is based on a series of simple benchmarks on a simulated 64-processor Alewife machine, a distributed-memory multiprocessor currently under development at MIT. Although locking techniques are known to perform well on small-scale, bus-based multiprocessors, serialization limits performance, and contention can degrade performance. Both counting networks and combining trees outperform the other methods substantially by avoiding serialization and alleviating contention, although combining-tree throughput is more sensitive to variations in load. A comparison of shared-memory and message-passing implementations of counting networks and combining trees shows that message-passing implementations have substantially higher throughput.