Analysis of interprocedural side effects in a parallel programming environment
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Generating explicit communication from shared-memory program references
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The high performance Fortran handbook
The high performance Fortran handbook
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Idiom recognition in the Polaris parallelizing compiler
ICS '95 Proceedings of the 9th international conference on Supercomputing
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
PGHPF—an optimizing High Performance Fortran compiler for distributed memory machines
Scientific Programming - Special issue: High Performance Fortran comes of age
Quantifying the performance differences between PVM and TreadMarks
Journal of Parallel and Distributed Computing
Compiler analysis of irregular memory accesses
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Journal of Parallel and Distributed Computing
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
The range test: a dependence test for symbolic, non-linear expressions
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors
The Journal of Supercomputing
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
Combining dependence and data-flow analyses to optimize communication
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Compiler Analysis for Irregular Problems in Fortran D
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
Optimizing OpenMP programs on software distributed shared memory systems
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Supporting realistic OpenMP applications on a commodity cluster of workstations
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Compiling for a hybrid programming model using the LMAD representation
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing irregular shared-memory applications for clusters
Proceedings of the 22nd annual international conference on Supercomputing
OpenMP Extensions for Irregular Parallel Applications on Clusters
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Distributed Implementation of OpenMP Based on Checkpointing Aided Parallel Execution
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Runtime address space computation for SDSM systems
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Micro-benchmarks for cluster OpenMP implementations: memory consistency costs
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Incorporation of OpenMP memory consistency into conventional dataflow analysis
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
STEP: a distributed OpenMP for coarse-grain parallelism tool
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Mechanisms that separate algorithms from implementations for parallel patterns
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Productive cluster programming with OmpSs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Network-theoretic classification of parallel computation patterns
International Journal of High Performance Computing Applications
Toward a distributed implementation of openMP using CAPE
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Strategies and implementation for translating OpenMP code for clusters
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Checkpointing aided parallel execution model and analysis
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Multiclass classification of distributed memory parallel computations
Pattern Recognition Letters
Hi-index | 0.00 |
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems. This translation aims to extend the ease of creating parallel applications with OpenMP to a wider variety of platforms, such as commodity cluster systems. We present key concepts and describe techniques to analyze and efficiently handle both regular and irregular accesses to shared data.We evaluate the performance achieved by our translation scheme on seven representative OpenMP applications, two from SPEC OMPM2001 and five from the NAS Parallel Benchmarks suite, on two different platforms. The average scalability (execution time relative to the serial version) achieved is within 12% of that achieved by corresponding hand-tuned MPI applications. We also compare our programs with versions deployed for a Software Distributed Shared Memory (SDSM) system and find that the direct translation to MPI achieves up to 30% higher scalability. A comparison with High Performance Fortran (HPF) versions of two NAS benchmarks indicates that our translated OpenMP versions achieve 12% to 89% better performance than the HPF versions.