The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
Exploiting memory affinity in OpenMP through schedule reuse
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
International Journal of Parallel Programming
High-Level Data Mapping for Clusters of SMPs
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
OpenMP versus MPI for PDE Solvers Based on Regular Sparse Numerical Operators
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Language and Compiler Support for Hybrid-Parallel Programming on SMP Clusters
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Performance Oriented Programming for NUMA Architectures
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Communication and Optimization Aspects on Hybrid Architectures
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Program Development Environment for OpenMP Programs on ccNUMA Architectures
LSSC '01 Proceedings of the Third International Conference on Large-Scale Scientific Computing-Revised Papers
User-controllable coherence for high performance shared memory multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing OpenMP programs on software distributed shared memory systems
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Towards automatic translation of OpenMP to MPI
Proceedings of the 19th annual international conference on Supercomputing
OpenMP versus MPI for PDE solvers based on regular sparse numerical operators
Future Generation Computer Systems
Study of OpenMP applications on the InfiniBand-based software distributed shared-memory system
Parallel Computing - OpenMp
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
A transparent runtime data distribution engine for OpenMP
Scientific Programming
OpenMP issues arising in the development of parallel BLAS and LAPACK libraries
Scientific Programming - OpenMP
Scaling non-regular shared-memory codes by reusing custom loop schedules
Scientific Programming - OpenMP
Exploiting fine-grain thread parallelism on multicore architectures
Scientific Programming - Software Development for Multi-core Computing Systems
OpenMP versus MPI for PDE solvers based on regular sparse numerical operators
Future Generation Computer Systems
Language support for multi-paradigm and multi-grain parallelism on SMP-Cluster
International Journal of Computers and Applications
OpenMP and NUMA architectures I: Investigating memory placement on the SGI origin 3000
ICCS'03 Proceedings of the 2003 international conference on Computational science
Supporting realistic OpenMP applications on a commodity cluster of workstations
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory
Proceedings of the Conference on Design, Automation and Test in Europe
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Affinity-on-next-touch: an extension to the Linux kernel for NUMA architectures
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Exploiting thread-data affinity in OpenMP with data access patterns
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Node-based memory management for scalable NUMA architectures
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q
IBM Journal of Research and Development
Hi-index | 0.00 |
This paper describes extensions to OpenMP which implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement ofcomputations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP-designed for shared-memory architectures-does not by itself address these issues.The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.