An evaluation of global address space languages: co-array fortran and unified parallel C

Authors:
Cristian Coarfa;Yuri Dotsenko;John Mellor-Crummey;François Cantonnet;Tarek El-Ghazawi;Ashrujit Mohanti;Yiyi Yao;Daniel Chavarría-Miranda
Affiliations:
Rice University, Houston, TX;Rice University, Houston, TX;Rice University, Houston, TX;George Washington University, Washington, DC;George Washington University, Washington, DC;George Washington University, Washington, DC;George Washington University, Washington, DC;Pacific Northwest National Laboratory, Richland, WA
Venue:
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2005

Citing 10
Cited 26

A scalable implementation of the NAS Parallel Benchmark BT on distributed memory systems

IBM Systems Journal
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
UPC performance and potential: a NPB experimental study

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A performance analysis of the Berkeley UPC compiler

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1

GASNet Specification, v1.1
A Multi-Platform Co-Array Fortran Compiler

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Experiences with co-array fortran on hardware shared memory platforms

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Scalability analysis of SPMD codes using expectations

Proceedings of the 21st annual international conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Type inference for locality analysis of distributed data structures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Performance without pain = productivity: data layout and collective communication in UPC

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Some Aspects of Message-Passing on Future Hybrid Systems (Extended Abstract)

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Framework for Proving Correctness of Adjoint Message-Passing Programs

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Maximum weighted matching using the partitioned global address space model

SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
On the Need for a Consortium of Capability Centers

International Journal of High Performance Computing Applications
A characterization of shared data access patterns in UPC programs

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Prospectus for the next LAPACK and ScaLAPACK libraries

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
A practical study of UPC using the NAS Parallel Benchmarks

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
ScaleUPC: a UPC compiler for multi-core systems

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
HparC: a mixed nested shared memory and message passing programming style intended for grid

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
A high-level framework for distributed processing of large-scale graphs

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Towards liquid service oriented architectures

Proceedings of the 20th international conference companion on World wide web
HipG: parallel processing of large-scale graphs

ACM SIGOPS Operating Systems Review
Communication-centric optimizations by dynamically detecting collective operations

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Offload – automating code migration to heterogeneous multicore systems

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Audit: A new synchronization API for the GET/PUT protocol

Journal of Parallel and Distributed Computing
Runtime detection and optimization of collective communication patterns

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Scale-out NUMA

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Ada and many-core platforms

ACM SIGAda Ada Letters
X10-FT: Transparent fault tolerance for APGAS language and runtime

Parallel Computing
Data Parallel Implementation of Belief Propagation in Factor Graphs on Multi-core Platforms

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.