Performance Portability on EARTH: A Case Study across Several Parallel Architectures
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
A grid-enabled software distributed shared memory system on a wide area network
Future Generation Computer Systems
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory
Proceedings of the Conference on Design, Automation and Test in Europe
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
The SCore cluster enabled OpenMP environment: performance prospects for computational science
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Supporting the OpenMP programming interface on teamster-g
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Hi-index | 0.01 |
OpenMP has attracted widespread interest because itis an easy-to-use parallel programming model for sharedmemory multiprocessor systems. Implementation of a"cluster-enabled" OpenMP compiler is presented. Compile programs are linked to the page-based softwaredistributed-shared-memory system, SCASH, which runs onPC clusters. This allows OpenMP programs to be run transparently in a distributed memory environment. The compiler converts programs written for OpenMP into parallel programs using the SCASH static library, moving allshared global variables into SCASH shared address spaceat runtime. As data mapping has a great impact on theperformance of OpenMP programs compile for softwaredistributed-shared-memory, extensions to OpenMP directives are defined for specifying data mapping and loopscheduling behavior, allowing data to be allocate to thenode where it is to be processed. Experimental results ofbenchmark programs on PC clusters using both Myrinetand fast Ethernet are reported.