A global approach to detection of parallelism
A global approach to detection of parallelism
Compiler algorithms for synchronization
IEEE Transactions on Computers
Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Eliminating false data dependences using the Omega test
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Fortran-S: a Fortran interface for shared virtual memory architectures
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Unified compilation of Fortran 77D and 90D
ACM Letters on Programming Languages and Systems (LOPLAS)
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Synchronization minimization in a SPMD execution model
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Compiler optimizations for parallel loops with fine-grained synchronization
Compiler optimizations for parallel loops with fine-grained synchronization
Compiler reduction of synchronisation in shared virtual memory systems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
A graph based approach to barrier synchronisation minimisation
ICS '97 Proceedings of the 11th international conference on Supercomputing
First fast sink: a compiler algorithm for barrier placement optimisation
Future Generation Computer Systems - Special issue on HPCN '97
Eliminating barrier synchronization for compiler-parallelized codes on software DSMs
International Journal of Parallel Programming - Special issue on languages and compilers for parallel computing. Part I
Foundations of Computer Science
Foundations of Computer Science
Multiprocessor Synchronization for Concurrent Loops
IEEE Software
Barrier Synchronisation Optimisation
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Synchronization Issues in Data-Parallel Languages
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A New Algorithm for Global Optimization for Parallelism and Locality
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Automatic synchronisation elimination in synchronous FORALLs
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Compiler parallelization of C programs for multi-core DSPs with multiple address spaces
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems
IEEE Transactions on Parallel and Distributed Systems
A linear-time algorithm for optimal barrier placement
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
Barrier matching for programs with textually unaligned barriers
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Enforcing textual alignment of collectives using dynamic checks
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
This paper presents a new compiler approach to minimizing the number of barriers executed in parallelized programs. A simple procedure is developed to reduce the complexity of barrier placement by eliminating certain data dependences, without affecting optimality. An algorithm is presented which, provably, places the minimal number of barriers in perfect loop nests and in certain imperfect loop nest structures. This scheme is generalized to accept entire, well-structured control-flow programs containing arbitrary nesting of IF constructs, loops, and subroutines. It has been implemented in a prototype parallelizing compiler and applied to several well-known benchmarks where it has been shown to place significantly fewer synchronization points than existing techniques. Experiments indicate that on average the number of barriers executed is reduced by 70 percent and there is a three fold improvement in execution time when evaluated on a 32-processor SGI Origin 2000.