ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
A characterization of shared data access patterns in UPC programs
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
UPC collective operations optimization
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
A practical study of UPC using the NAS Parallel Benchmarks
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
UPC performance evaluation on a multicore system
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
A performance model for fine-grain accesses in UPC
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Development and performance analysis of a UPC Particle-in-Cell code
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Optimizing the Barnes-Hut algorithm in UPC
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
UPC is a parallel programming language based on the concept of partitioned shared memory. There are now several UPC compilers available and several different parallel architectures that support one or more of these compilers. This paper is the first to compare the performance of most of the currently available UPC implementations on several commonly used parallel platforms. These compilers are the GASNet UPC compiler from UC Berkeley, the v1.1 MuPC compiler from Michigan Tech, the Hewlet-Packard v2.2 compiler, and the Intrepid UPC compiler. The parallel architectures used in this study are a 16-node x86 Myrinet cluster, a 32-processor AlphaServer SC-40, and a 48-processor Cray T3E. A STREAM-like microbenchmark was developed to measure fine- and course-grained shared memory accesses. Also measured are five NPB kernels using existing UPC implementations. These measurements and associated observations provide a snapshot of the relative performance of current UPC platforms.