Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
High-performance computer architecture (2nd ed.)
High-performance computer architecture (2nd ed.)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters
IEEE Transactions on Parallel and Distributed Systems
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Non-contiguous processor allocation algorithms for distributed memory multicomputers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Turn Grouping for Efficient Barrier Synchronization in Wormhole Mesh Networks
ICPP '97 Proceedings of the international Conference on Parallel Processing
Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Barrier Synchronization on Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Distributed-sum termination detection supporting multithreaded execution
Parallel Computing
Journal of Parallel and Distributed Computing
Distributed generalized dynamic barrier synchronization
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Hi-index | 0.00 |
In this paper, we consider a tree-based routing scheme for supporting barrier synchronization on scalable parallel computers with a 2D mesh network. Based on the characteristics of a standard programming interface, the scheme builds a collective synchronization (CS) tree among the participating nodes using a distributed algorithm. When the routers are set up properly with the CS tree information, barrier synchronization can be accomplished very efficiently by passing simple messages. Performance evaluations show that our proposed method performs better than previous path-based approaches and is less sensitive to variations in group size and startup delay. However, our scheme has the extra overhead of building the CS tree. Thus, it is more suitable for parallel iterative computations in which the same barrier is invoked repetitively.