Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
A compiler-assisted approach to SPMD execution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Data-parallel programming on MIMD computers
Data-parallel programming on MIMD computers
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The high performance Fortran handbook
The high performance Fortran handbook
Evaluation of release consistent software distributed shared memory on emerging network technology
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Managing pages in shared virtual memory systems: getting the compiler into the game
ICS '93 Proceedings of the 7th international conference on Supercomputing
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiler reduction of synchronisation in shared virtual memory systems
ICS '95 Proceedings of the 9th international conference on Supercomputing
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Distributed shared memory systems with improved barrier synchronization and data transfer
ICS '97 Proceedings of the 11th international conference on Supercomputing
A graph based approach to barrier synchronisation minimisation
ICS '97 Proceedings of the 11th international conference on Supercomputing
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler-directed shared-memory communication for iterative parallel applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Improving the performance of DSM systems via compiler involvement
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
HPF on Fine-Grain Distributed Shared Memory: Early Experience
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Synchronization Issues in Data-Parallel Languages
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Automatic synchronisation elimination in synchronous FORALLs
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
The relative importance of concurrent writers and weak consistency models
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Simulation tools to study a distributed shared memory for clusters of symmetric multiprocessors
Future Generation Computer Systems
Hi-index | 0.00 |
Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance comparable to messagepassing compilers for dense-matrix kernels. However, synchronization and load imbalance are significant sources of overhead. In this paper, we investigate the impact of compilation techniques for eliminating barrier synchronization overhead in software DSMs. Our compile-time barrier elimination algorithm extends previous techniques in three ways: (1) we perform inexpensive communication analysis through local subscript analysis when using chunk iteration partitioning for parallel loops; (2) we exploit delayed updates in lazy-release-consistency DSMs to eliminate barriers guarding only anti-dependences; (3) when possible we replace barriers with customized nearest-neighbor synchronization. Experiments on an IBM SP-2 indicate these techniques can improve parallel performance by 20% on average and by up to 60% for some applications.