International Journal of Parallel Programming
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
Compiler algorithms for synchronization
IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
A fetch-and-op implementation for parallel computers
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Analysis of a 3D toroidal network for a shared memory architecture
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Software combining algorithms for distributing hot-spot addressing
Journal of Parallel and Distributed Computing
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Fast barrier synchronization hardware
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Multiprocessor communications: design and technology
Multiprocessor communications: design and technology
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control
IEEE Transactions on Parallel and Distributed Systems
Software structures for ultraparallel computing
Software structures for ultraparallel computing
PORTS: a parallel, optimistic, real-time simulator
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Request Combining in Multiprocessors with Arbitrary Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
TLSync: support for multiple fast barriers using on-chip transmission lines
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.00 |
In large multiprocessor systems, fast synchronization is crucial for high performance. However, synchronization traffic tends to create “hot-spots” in shared memory and cause network congestion. Multistage shuffle-exchange networks have been proposed and built to handle synchronization traffic. Software combining schemes have also been proposed to relieve network congestion caused by hot-spots. However, multistage combining networks could be very expensive and software combining could be very slow.In this paper, we propose a single-stage combining network to handle synchronization traffic, which is separated from the regular memory traffic. A single-stage combining network has several advantages: (1) it is attractive from an implementation perspective because only one stage is needed(instead of log N stages); (2) Only one network is needed to handle both forward and returning requests; (3) combined requests are distributed evenly through the network—the wait buffer size is reduced; and (4) fast-finishing algorithms [30] can be used to shorten the network delay.Because of all these advantages, we show that a single-stage combining network gives good performance at a lower cost than a multistage combining network.