High Performance Fortran: Language Specification (PART II)
ACM SIGPLAN Fortran Forum - Special issue: high performance Fortran language specification, part 2
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system
Proceedings of the 19th annual international conference on Supercomputing
Towards a more efficient implementation of OpenMP for clusters via translation to global arrays
Parallel Computing - OpenMp
Execution model of three parallel languages: OpenMP, UPC and CAF
Scientific Programming - International Symposium of Parallel and Distributed Computing & International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogenous Networks
Dynamic data migration for structured AMR solvers
International Journal of Parallel Programming
Bsp2omp: A Compiler For Translating Bsp Programs To Openmp
International Journal of Parallel, Emergent and Distributed Systems - Advances in Parallel and Distributed Computational Models
Analyses for the translation of OpenMP codes into SPMD style with array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Improving the performance of OpenMP by array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Geographical locality and dynamic data migration for OpenMP implementations of adaptive PDE solvers
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Efficient implementation of OpenMP for clusters with implicit data distribution
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Hi-index | 0.00 |
This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP -- designed for shared-memory architectures -- does not by itself address these issues. The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.