Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Quantifying the performance differences between PVM and TreadMarks
Journal of Parallel and Distributed Computing
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs
ICS '99 Proceedings of the 13th international conference on Supercomputing
Optimization of MPI collectives on clusters of large-scale SMP's
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel sorting on cache-coherent DSM multiprocessors
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms
IEEE Transactions on Parallel and Distributed Systems
Parallel tetrahedral mesh adaptation with dynamic load balancing
Parallel Computing - Special issue on graph partioning and parallel computing
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the origin2000
Journal of Parallel and Distributed Computing
International Journal of Parallel Programming
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
ParADE: An OpenMP Programming Environment for SMP Cluster Systems
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
High performance noise reduction for biomedical multidimensional data
Digital Signal Processing
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
International Journal of High Performance Computing Applications
Overcoming performance bottlenecks in using OpenMP on SMP clusters
Parallel Computing
Scopira: an open source C++ framework for biomedical data analysis applications
Software—Practice & Experience
A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects
International Journal of Parallel Programming
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartII
Parallelization methods for implementation of discharge simulation along resin insulator surfaces
Computers and Electrical Engineering
A fast and resource-conscious MPI message queue mechanism for large-scale jobs
Future Generation Computer Systems
Hi-index | 0.00 |
Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex commumcation patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI + SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.