ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Performance Evaluation of Unified Parallel C Collective Communications
HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
An evaluation of OpenMP on current and emerging multithreaded/multicore processors
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Hybrid MPI and OpenMP parallel programming
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Optimizing a parallel runtime system for multicore clusters: a case study
Proceedings of the 2010 TeraGrid Conference
A programming model performance study using the NAS parallel benchmarks
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Hybrid programming model for implicit PDE simulations on multicore architectures
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Computers and Electrical Engineering
Performance evaluation of OpenMP-based algorithms for handling Kronecker descriptors
Journal of Parallel and Distributed Computing
Exploring cross-layer power management for PGAS applications on the SCC platform
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Portable explicit threading and concurrent programming for MPI applications
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
MPI-hybrid parallelism for volume rendering on large, multi-core systems
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
UPCBLAS: a library for parallel matrix computations in Unified Parallel C
Concurrency and Computation: Practice & Experience
PCJ - new approach for parallel computations in java
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Java in the High Performance Computing arena: Research, practice and experience
Science of Computer Programming
Performance evaluation of sparse matrix products in UPC
The Journal of Supercomputing
Parallel simulation of Brownian dynamics on shared memory systems with OpenMP and Unified Parallel C
The Journal of Supercomputing
Hi-index | 0.01 |
The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and OpenMP on multicore architectures. From the analysis of the results, it can be concluded that MPI is generally the best choice on multicore systems with both shared and hybrid shared/distributed memory, as it takes the highest advantage of data locality, the key factor for performance in these systems. Regarding UPC, although it exploits efficiently the data layout in memory, it suffers from remote shared memory accesses, whereas OpenMP usually lacks efficient data locality support and is restricted to shared memory systems, which limits its scalability.