Towards a more efficient implementation of OpenMP for clusters via translation to global arrays
Parallel Computing - OpenMp
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A streaming machine description and programming model
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
An evaluation of OpenMP on current and emerging multithreaded/multicore processors
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
To date OpenMP has been considered the work horse fordata parallelism and more recently task level parallelism. The model hasbeen one of shared memory working in parallel on arrays of a uniformnature, but many applications do not meet these often restrictive accesspatterns. With the development of accelerators on the one handand moving beyond the node to the cluster on the other, OpenMP'sshared memory approach does not easily capture the complex memoryhierarchies found in these heterogeneous systems. Streams provide a natural approach to coupling data with its correspondingaccess patterns. Data within a stream can be easily and efficientlydistributed across complex memory hierarchies, while retaining ashared memory point of view for the application programmer. In this paper we present a modest extension to OpenMP to supportdata partitioning and streaming. Rather than add numerous new directivesour approach is to utilize exiting streaming technology and extendOpenMP simply to control streams in the context of threading. The integrationof streams allows the programmer to easily connect distinctcompute components together in an efficient manner, supporting both,the conventional shared memory model of OpenMP and also the transparentintegration of local non-shared memory.