An effective synchronization network for hot-spot accesses

Authors:
William Tsun-Yuk Hsu;Pen-Chung Yew
Affiliations:
Univ. of Illinois at Urbana-Champaign, Urbana;Univ. of Illinois at Urbana-Champaign, Urbana
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1992

Citing 13
Cited 5

The butterfly barrier

International Journal of Parallel Programming
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
Compiler algorithms for synchronization

IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
A fetch-and-op implementation for parallel computers

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Analysis of a 3D toroidal network for a shared memory architecture

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Software combining algorithms for distributing hot-spot addressing

Journal of Parallel and Distributed Computing
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Fast barrier synchronization hardware

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Multiprocessor communications: design and technology

Multiprocessor communications: design and technology
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control

IEEE Transactions on Parallel and Distributed Systems
Software structures for ultraparallel computing

Software structures for ultraparallel computing

PORTS: a parallel, optimistic, real-time simulator

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Request Combining in Multiprocessors with Arbitrary Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors

IEEE Transactions on Computers
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In large multiprocessor systems, fast synchronization is crucial for high performance. However, synchronization traffic tends to create “hot-spots” in shared memory and cause network congestion. Multistage shuffle-exchange networks have been proposed and built to handle synchronization traffic. Software combining schemes have also been proposed to relieve network congestion caused by hot-spots. However, multistage combining networks could be very expensive and software combining could be very slow.In this paper, we propose a single-stage combining network to handle synchronization traffic, which is separated from the regular memory traffic. A single-stage combining network has several advantages: (1) it is attractive from an implementation perspective because only one stage is needed(instead of log N stages); (2) Only one network is needed to handle both forward and returning requests; (3) combined requests are distributed evenly through the network—the wait buffer size is reduced; and (4) fast-finishing algorithms [30] can be used to shorten the network delay.Because of all these advantages, we show that a single-stage combining network gives good performance at a lower cost than a multistage combining network.