Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Compiler reduction of synchronisation in shared virtual memory systems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Exploiting Advanced Task Parallelism in High Performance Fortran via a Task Library
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Language and Compiler Support for Hybrid-Parallel Programming on SMP Clusters
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Hi-index | 0.00 |
The OpenMP Application Program Interface supports parallel programming on scalable symmetric multiprocessor machines (SMP) with a shared memory by providing the user with simple work-sharing directives for C/C++ and Fortran so that the compiler can generate parallel programs based on thread parallelism. However, the lack of language features for exploiting data locality often results in poor performance since the non-uniform memory access times on scalable SMP machines cannot be neglected. HPF, the de-facto standard for data parallel programming, offers a rich set of data distribution directives in order to exploit data locality, but has mainly been targeted towards distributed memory machines. In this paper we describe an optimized execution model for HPF programs on SMP machines that avails itself with the mechanisms provided by OpenMP for work sharing and thread parallelism while exploiting data locality based on user-specified distribution directives. This execution model has been implemented in the ADAPTOR HPF compilation system and experimental results verify the efficiency of the chosen approach.