Effects of synchronization barriers on multiprocessor performance
Parallel Computing
Multiprocessor cache synchronization: issues, innovations, evolution
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
International Journal of Parallel Programming
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
A fetch-and-op implementation for parallel computers
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Two algorithms for barrier synchronization
International Journal of Parallel Programming
Guide to parallel programming on Sequent computer systems: 2nd edition
Guide to parallel programming on Sequent computer systems: 2nd edition
The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A decentralized control, highly concurrent multiprocesssor
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Parallel Programming and Compilers
Parallel Programming and Compilers
On program restructuring, scheduling, and communication for parallel processor systems
On program restructuring, scheduling, and communication for parallel processor systems
Future general purpose supercomputer architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Subset barrier synchronization on a private-memory parallel system
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
An effective synchronization network for hot-spot accesses
ACM Transactions on Computer Systems (TOCS)
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters
IEEE Transactions on Parallel and Distributed Systems
A locking facility for parallel systems
IBM Systems Journal
Implementation of reductions in support of PDES on a network of workstations
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Turn Grouping for Efficient Barrier Synchronization in Wormhole Mesh Networks
ICPP '97 Proceedings of the international Conference on Parallel Processing
A quasi-barrier technique to improve performance of an irregular application
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Fast synchronization for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC
Transactions on High-Performance Embedded Architectures and Compilers II
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
Automated modeling and emulation of interconnect designs for many-core chip multiprocessors
Proceedings of the 47th Design Automation Conference
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
TLSync: support for multiple fast barriers using on-chip transmission lines
Proceedings of the 38th annual international symposium on Computer architecture
Low-Overhead, high-speed multi-core barrier synchronization
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Many recent studies have considered the importance of barrier synchronization overhead on parallel loop performance, especially for large-scale parallel machines. This paper describes a hardware scheme for supporting fast barrier synchronization. It allows barrier synchronization to be performed within a single instruction cycle for moderately sized systems, and is scalable with logarithmic increase in synchronization time. It supports a large number of concurrent barriers, and can also be used to support a number of different barrier synchronization schemes. Simulation results show that under reasonable assumptions, this hardware can decrease parallel loop execution time significantly, especially for statically scheduled loops.