Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting Data Locality on Scalable Shared Memory Machines with Data Parallel Programs
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Performance of a new CFD flow solver using a hybrid programming paradigm
Journal of Parallel and Distributed Computing
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Performance characteristics of the multi-zone NAS parallel benchmarks
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
An Approach To Data Distributions in Chapel
International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The role of MPI in development time: a case study
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
A practical study of UPC using the NAS Parallel Benchmarks
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Enabling locality-aware computations in OpenMP
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Parallel FEM adaptation on hierarchical architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Parallel partitioning for distributed systems using sequential assignment
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.