Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1
A Multi-Platform Co-Array Fortran Compiler
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Experiences with co-array fortran on hardware shared memory platforms
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Scalability analysis of SPMD codes using expectations
Proceedings of the 21st annual international conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
Type inference for locality analysis of distributed data structures
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Performance without pain = productivity: data layout and collective communication in UPC
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Some Aspects of Message-Passing on Future Hybrid Systems (Extended Abstract)
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Framework for Proving Correctness of Adjoint Message-Passing Programs
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Maximum weighted matching using the partitioned global address space model
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
On the Need for a Consortium of Capability Centers
International Journal of High Performance Computing Applications
A characterization of shared data access patterns in UPC programs
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Prospectus for the next LAPACK and ScaLAPACK libraries
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
A practical study of UPC using the NAS Parallel Benchmarks
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
ScaleUPC: a UPC compiler for multi-core systems
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
HparC: a mixed nested shared memory and message passing programming style intended for grid
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
A high-level framework for distributed processing of large-scale graphs
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Towards liquid service oriented architectures
Proceedings of the 20th international conference companion on World wide web
HipG: parallel processing of large-scale graphs
ACM SIGOPS Operating Systems Review
Communication-centric optimizations by dynamically detecting collective operations
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Offload – automating code migration to heterogeneous multicore systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Audit: A new synchronization API for the GET/PUT protocol
Journal of Parallel and Distributed Computing
Runtime detection and optimization of collective communication patterns
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ACM SIGAda Ada Letters
X10-FT: Transparent fault tolerance for APGAS language and runtime
Parallel Computing
Data Parallel Implementation of Belief Propagation in Factor Graphs on Multi-core Platforms
International Journal of Parallel Programming
Hi-index | 0.00 |
Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.