Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Scope consistency: a bridge between release consistency and entry consistency
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Journal of Parallel and Distributed Computing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The Virtual Interface Architecture
IEEE Micro
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Message passing and shared address space parallelism on an SMP cluster
Parallel Computing
An Efficient Lock Protocol for Home-Based Lazy Release Consistency
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
The Midway Distributed Shared Memory System
The Midway Distributed Shared Memory System
Home-based shared virtual memory
Home-based shared virtual memory
Distributed Shared Arrays: An Integration of Message Passing and Multithreading on SMP Clusters
The Journal of Supercomputing
Performance Portability on EARTH: A Case Study across Several Parallel Architectures
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Towards a more efficient implementation of OpenMP for clusters via translation to global arrays
Parallel Computing - OpenMp
Running OpenMP applications efficiently on an everything-shared SDSM
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Overcoming performance bottlenecks in using OpenMP on SMP clusters
Parallel Computing
Micro-benchmarks for cluster OpenMP implementations: memory consistency costs
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
STEP: a distributed OpenMP for coarse-grain parallelism tool
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Region-Based Prefetch Techniques for Software Distributed Shared Memory Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Reducing data access latency in SDSM systems using runtime optimizations
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Mechanisms that separate algorithms from implementations for parallel patterns
Proceedings of the 2010 Workshop on Parallel Programming Patterns
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Strategies and implementation for translating OpenMP code for clusters
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Generating data transfers for distributed GPU parallel programs
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Demand for programming environments to exploit clusters of symmetric multiprocessors (SMPs) is increasing. In this paper, we present a new programming environment, called ParADE, to enable easy, portable, and high-performance programming on SMP clusters. It is an OpenMP programming environment on top of a multi-threaded software distributed shared memory (SDSM) system with a variant of home-based lazy release consistency protocol. To boost performance, the runtime system provides explicit message-passing primitives to make it a hybrid-programming environment. Collective communication primitives are used for the synchronization and work-sharing directives associated with small data structures, lessening the synchronization overhead and avoiding the implicit barriers of work-sharing directives. The OpenMP translator bridges the gap between the OpenMP abstraction and the hybrid programming interfaces of the runtime system. The experiments with several NAS benchmarks and applications on a Linux-based cluster show promising results that ParADE overcomes the performance problem of the conventional SDSM-based OpenMP environment.