MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
OpenMP on networks of workstations for software DSMs
Journal of Computer Science and Technology
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
International Journal of Parallel Programming
Runtime Support for Multigrain and Multiparadigm Parallelism
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
A Fully Compliant OpenMP Implementationon Software Distributed Shared Memory
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
High-Level Data Mapping for Clusters of SMPs
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Language and Compiler Support for Hybrid-Parallel Programming on SMP Clusters
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A New Home-Based Software DSM Protocol for SMP Clusters
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Fault-Tolerant Distributed Shared Memory on a Broadcast-Based Architecture
IEEE Transactions on Parallel and Distributed Systems
A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors
The Journal of Supercomputing
A grid-enabled software distributed shared memory system on a wide area network
Future Generation Computer Systems
A transparent runtime data distribution engine for OpenMP
Scientific Programming
Compiler optimization techniques for OpenMP programs
Scientific Programming
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
OpenMP runtime support for clusters of multiprocessors
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Experiences using OpenMP based on compiler directed software DSM on a PC cluster
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Integrating MPI and nanothreads programming model
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Hi-index | 0.00 |
In this paper, we present the first system that implements OpenMP on a network of shared-memory multiprocessors. This system enables the programmer to rely on a single, standard, shared-memory API for parallelization within a multiprocessor and between multiprocessors. It is imple-mented via a translator that converts OpenMP directives to appropriate calls to a modified version of the TreadMarks software distributed memory system (SDSM). In contrast to previous SDSM systems for SMPs, the modified TreadMarks uses POSIX threads for parallelism within an SMP node. This approach greatly simplifies the changes required to the SDSM in order to exploit the intra-node hardware shared memory.We present performance results for six applications (SPLASH-2 Barnes-Hut and Water, NAS 3D-FFT, SOR, TSP and MGS) running on an SP2 with four four-processor SMP nodes. A comparison between the threaded implementation and the original implementation of TreadMarks shows that using the hardware shared memory within an SMP node significantly reduces the amount of data and the number of messages transmitted between nodes, and consequently achieves speedups up to 30% better than the original ver-sions. We also compare SDSM against message passing. Overall, the speedups of multithreaded TreadMarks pro-grams are within 7-30% of the MPI versions.