Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters

Authors:
Shisheng Shang;Kai Hwang
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1995

Citing 18
Cited 13

Effects of synchronization barriers on multiprocessor performance

Parallel Computing
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
VLSI assist for a multiprocessor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Applications considerations in the system design of highly concurrent multiprocessors

IEEE Transactions on Computers
Compiler algorithms for synchronization

IEEE Transactions on Computers
The fuzzy barrier: a mechanism for high speed synchronization of processors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A scalable implementation of barrier synchronization using an adaptive combining tree

International Journal of Parallel Programming
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Processor scheduling in shared memory multiprocessors

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Fast barrier synchronization hardware

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Synchronization with multiprocessor caches

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
PAX Computer; High-Speed Parallel Processing and Scientific Computing

PAX Computer; High-Speed Parallel Processing and Scientific Computing
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems

Synchronization hardware for networks of workstations: performance vs. cost

ICS '96 Proceedings of the 10th international conference on Supercomputing
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
Turn Grouping for Efficient Barrier Synchronization in Wormhole Mesh Networks

ICPP '97 Proceedings of the international Conference on Parallel Processing
Dynamic Task Scheduling with Precedence Constraints and Communication Delays

PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Distributed-sum termination detection supporting multithreaded execution

Parallel Computing
Fast synchronization on shared-memory multiprocessors: An architectural approach

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Tiered Algorithm for Distributed Process Quiescence and Termination Detection

IEEE Transactions on Parallel and Distributed Systems
Scalable barrier synchronisation for large-scale shared-memory multiprocessors

International Journal of High Performance Computing and Networking
Graphical design tool for parallel programs with execution control based on global application states

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
Low-Overhead, high-speed multi-core barrier synchronization

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional multiprocessors mostly use centralized, memory-based barriers to synchronize concurrent processes created in multiple processors. These centralized barriers often become the bottleneck or hot spots in the shared memory. In this paper, we overcome the difficulty by presenting a distributed and hardwired barrier architecture, that is hierarchically constructed for fast synchronization in cluster-structured multiprocessors. The hierarchical architecture enables the scalability of cluster-structured multiprocessors. A special set of synchronization primitives is developed for explicit use of distributed barriers dynamically. To show the application of the hardwired barriers, we demonstrate how to synchronize Doall and Doacross loops using a limited number of hardwired barriers. Timing analysis shows an $O(10^2)$ to $O(10^5)$ reduction in synchronization overhead, compared with the use of software-controlled barriers implemented in a shared memory. The hardwired architecture is effective in implementing any partially ordered set of barriers or fuzzy barriers with extended synchronization regions. The versatility, scalability, programmability, and low overhead make the distributed barrier architecture attractive in constructing fine-grain, massively parallel MIMD systems using multiprocessor clusters with distributed shared memory.Index Terms驴Barrier synchronization, distributed shared memory, Doacross loops, Doall loops, fuzzy barriers, parallel processing, partially ordered barriers, scalable multiprocessors, wired-NOR logic.