Efficient Barrier Synchronization Mechanism for the BSP Model on Message-Passing Architectures

Authors:
C. Jhon
Affiliations:
-
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 3
Cited 6

A bridging model for parallel computation

Communications of the ACM
Towards efficiency and portability: programming with the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Management of Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in BSPk

Management of Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in BSPk

Optimal Clustering of Tree-Sweep Computations for High-Latency Parallel Environments

IEEE Transactions on Parallel and Distributed Systems
Barrier Synchronization on Wormhole-Routed Networks

IEEE Transactions on Parallel and Distributed Systems
HiHCoHP: Toward a Realistic Communication Model for Hierarchical HyperClusters of Heterogeneous Processors

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Efficient trigger-broadcasting in heterogeneous clusters

Journal of Parallel and Distributed Computing
Making time-stepped applications tick in the cloud

Proceedings of the 2nd ACM Symposium on Cloud Computing
ClouDiA: a deployment advisor for public clouds

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Bulk Synchronous Parallel (BSP) model of computation can be used to develop efficient and portable programs for a range of machines and applications. However, the cost of the barrier synchronization used in the BSP model is relatively expensive for message-passing architectures. In this paper, we relax the barrier synchronization constraint in the BSP model for the efficient implementation on message-passing architectures.In our relaxed barrier synchronization, the synchronization occurs at the time of accessing non-local data only between the producer and the consumer processors, eliminating the exchange of global information. From the experimental evaluations on IBM SP2, we have observed that the relaxed barrier synchronization reduces the total synchronization time by 45.2% to 61.5% in FT, and 28.6% to 49.0% in LU with 32 processors.