Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A software instruction counter
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Adaptive backoff synchronization techniques
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Introduction to algorithms
Counting networks and multi-processor coordination
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Process coordination with fetch-and-increment
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Low contention linearizable counting
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Low contention load balancing on large-scale multiprocessors
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Contention in balancing networks resolved (extended abstract)
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Combining funnels: a new twist on an old tale…
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Scalable concurrent priority queue algorithms
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Priority Queues and Sorting Methods for Parallel Simulation
IEEE Transactions on Software Engineering
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A Combinatorial Characterization of Properties Preserved by Antitokens
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A scalable lock-free stack algorithm
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
The counting pyramid: an adaptive distributed counting scheme
Journal of Parallel and Distributed Computing
Distributed Computing
Linearizable counting networks
Distributed Computing
A scalable lock-free stack algorithm
Journal of Parallel and Distributed Computing
Supporting increment and decrement operations in balancing networks
STACS'99 Proceedings of the 16th annual conference on Theoretical aspects of computer science
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Scalable producer-consumer pools based on elimination-diffraction trees
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Lightweight contention management for efficient compare-and-swap operations
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Leveraging hardware message passing for efficient thread synchronization
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Elimination Trees and the Construction of Pools and Stacks
Theory of Computing Systems
Hi-index | 0.00 |
The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. We investigate the scalability of a variety of counting techniques for large-scale multiprocessors. We compare counting techniques based on: (1) spin locks, (2) message passing, (3) distributed queues, (4) software combining trees, and (5) counting networks. Our comparison is based on a series of simple benchmarks on a simulated 64-processor Alewife machine, a distributed-memory multiprocessor currently under development at MIT. Although locking techniques are known to perform well on small-scale, bus-based multiprocessors, serialization limits performance, and contention can degrade performance. Both counting networks and combining trees outperform the other methods substantially by avoiding serialization and alleviating contention, although combining-tree throughput is more sensitive to variations in load. A comparison of shared-memory and message-passing implementations of counting networks and combining trees shows that message-passing implementations have substantially higher throughput.