Fast barrier synchronization hardware

Authors:
Carl J. Beckmann;Constantine D. Polychronopoulos
Affiliations:
Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 305 Talbot Lab - 104 South Wright Street, Urbana, Illinois;Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 305 Talbot Lab - 104 South Wright Street, Urbana, Illinois
Venue:
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Year:
1990

Citing 11
Cited 18

Effects of synchronization barriers on multiprocessor performance

Parallel Computing
Multiprocessor cache synchronization: issues, innovations, evolution

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The butterfly barrier

International Journal of Parallel Programming
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
A fetch-and-op implementation for parallel computers

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Two algorithms for barrier synchronization

International Journal of Parallel Programming
Guide to parallel programming on Sequent computer systems: 2nd edition

Guide to parallel programming on Sequent computer systems: 2nd edition
The fuzzy barrier: a mechanism for high speed synchronization of processors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A decentralized control, highly concurrent multiprocesssor

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Parallel Programming and Compilers

Parallel Programming and Compilers
On program restructuring, scheduling, and communication for parallel processor systems

On program restructuring, scheduling, and communication for parallel processor systems

Future general purpose supercomputer architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Subset barrier synchronization on a private-memory parallel system

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
An effective synchronization network for hot-spot accesses

ACM Transactions on Computer Systems (TOCS)
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters

IEEE Transactions on Parallel and Distributed Systems
A locking facility for parallel systems

IBM Systems Journal
Implementation of reductions in support of PDES on a network of workstations

PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Turn Grouping for Efficient Barrier Synchronization in Wormhole Mesh Networks

ICPP '97 Proceedings of the international Conference on Parallel Processing
A quasi-barrier technique to improve performance of an irregular application

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Fast synchronization for chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC

Transactions on High-Performance Embedded Architectures and Compilers II
Efficient high performance collective communication for the cell blade

Proceedings of the 23rd international conference on Supercomputing
Automated modeling and emulation of interconnect designs for many-core chip multiprocessors

Proceedings of the 47th Design Automation Conference
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
Low-Overhead, high-speed multi-core barrier synchronization

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many recent studies have considered the importance of barrier synchronization overhead on parallel loop performance, especially for large-scale parallel machines. This paper describes a hardware scheme for supporting fast barrier synchronization. It allows barrier synchronization to be performed within a single instruction cycle for moderately sized systems, and is scalable with logarithmic increase in synchronization time. It supports a large number of concurrent barriers, and can also be used to support a number of different barrier synchronization schemes. Simulation results show that under reasonable assumptions, this hardware can decrease parallel loop execution time significantly, especially for statically scheduled loops.