The Stanford Dash Multiprocessor
Computer
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimizing OpenMP programs on software distributed shared memory systems
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
CAS-DSM: a compiler assisted software distributed shared memory
International Journal of Parallel Programming
Supporting realistic OpenMP applications on a commodity cluster of workstations
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Probabilistic analysis of time reduction by eliminating barriers in parallel programmes
International Journal of Communication Networks and Distributed Systems
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
In this paper, we propose a new compiler technique for eliminating barrier synchronizations. In our approach, the compiler collects access information about array accesses and analyzes data dependency. If there was no dependency, barrier synchronizations can be eliminated. Additionally, even if the dependency was detected, there are cases when the barrier synchronization can be replaced with send-receive pairs of communications. For evaluation, we executed two application programs: Jacobi Method and Gaussian Elimination, on a PC cluster with barrier elimination applied. For comparison, we also executed the programs before elimination of barrier synchronizations. With barrier elimination, 1) the execution time is always reduced, and 2) as the number of processors increases, the reduction ratio of the execution time also increases. For 16 processors, we obtained 19.00% and 50.36% of the reduction ratio for Jacobi Method and Gaussian Elimination respectively.