A performance analysis of the Berkeley UPC compiler

Authors:
Parry Husbands;Costin Iancu;Katherine Yelick
Affiliations:
Lawrence Berkeley National Laboratory;Lawrence Berkeley National Laboratory;University of California at Berkeley
Venue:
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Year:
2003

Citing 12
Cited 42

Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Type systems for distributed data structures

Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Communication optimizations for parallel C programs

Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
UPC Benchmarking Issues

ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
UPC performance and potential: a NPB experimental study

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hiding Relaxed Memory Consistency with Compilers

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
An Evaluation of Current High-Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Titanium Language Reference Manual

Titanium Language Reference Manual
GASNet Specification, v1.1

GASNet Specification, v1.1

Evaluating support for global address space languages on the Cray X1

Proceedings of the 18th annual international conference on Supercomputing
Benchmark Measurements of Current UPC Platforms

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
An evaluation of global address space languages: co-array fortran and unified parallel C

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalability analysis of SPMD codes using expectations

Proceedings of the 21st annual international conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
An Approach To Data Distributions in Chapel

International Journal of High Performance Computing Applications
Parallel Languages and Compilers: Perspective From the Titanium Experience

International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Performance without pain = productivity: data layout and collective communication in UPC

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations

International Journal of High Performance Computing and Networking
Multi-threading and one-sided communication in parallel LU factorization

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Performance portable optimizations for loops containing communication operations

Proceedings of the 22nd annual international conference on Supercomputing
From FORTRAN 77 to locality-aware high productivity languages for peta-scale computing

Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
Servo: a programming model for many-core computing

ACM SIGARCH Computer Architecture News
COMIC: a coherent shared memory interface for cell be

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Survey on Parallel Programming Model

NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
Scheduling dynamic parallelism on accelerators

Proceedings of the 6th ACM conference on Computing frontiers
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Model-guided autotuning of high-productivity languages for petascale computing

Proceedings of the 18th ACM international symposium on High performance distributed computing
Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
A characterization of shared data access patterns in UPC programs

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A performance model for fine-grain accesses in UPC

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing bandwidth limited problems using one-sided communication and overlap

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Unified parallel C for GPU clusters: language extensions and compiler implementation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Introducing mNUMA: an extended PGAS architecture

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces

Proceedings of the 2012 international symposium on Memory Management
Exploring cross-layer power management for PGAS applications on the SCC platform

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Productivity and Performance of Global-View Programming with XcalableMP PGAS Language

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Elastic computing: A portable optimization framework for hybrid computers

Parallel Computing
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
UPCBLAS: a library for parallel matrix computations in Unified Parallel C

Concurrency and Computation: Practice & Experience
CarFast: achieving higher statement coverage faster

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Towards a complexity model for design and analysis of PGAS-based algorithms

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Designing energy efficient communication runtime systems: a view from PGAS models

The Journal of Supercomputing
Scaling data race detection for partitioned global address space programs

Proceedings of the 27th international ACM conference on International conference on supercomputing
Data deduplication in a hybrid architecture for improving write performance

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unified Parallel C (UPC) is a parallel language that uses a Single Program Multiple Data (SPMD) model of parallelism within a global address space. The global address space is used to simplify programming, especially on applications with irregular data structures that lead to fine-grained sharing between threads. Recent results have shown that the performance of UPC using a commercial compiler is comparable to that of MPI [7]. In this paper we describe a portable open source compiler for UPC. Our goal is to achieve a similar performance while enabling easy porting of the compiler and runtime, and also provide a framework that allows for extensive optimizations. We identify some of the challenges in compiling UPC and use a combination of micro-benchmarks and application kernels to show that our compiler has low overhead for basic operations on shared data and is competitive, and sometimes faster than, the commercial HP compiler. We also investigate several communication optimizations, and show significant benefits by hand-optimizing the generated code.