Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Type systems for distributed data structures
Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Communication optimizations for parallel C programs
Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hiding Relaxed Memory Consistency with Compilers
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Titanium Language Reference Manual
Titanium Language Reference Manual
GASNet Specification, v1.1
Evaluating support for global address space languages on the Cray X1
Proceedings of the 18th annual international conference on Supercomputing
Benchmark Measurements of Current UPC Platforms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
An evaluation of global address space languages: co-array fortran and unified parallel C
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalability analysis of SPMD codes using expectations
Proceedings of the 21st annual international conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
An Approach To Data Distributions in Chapel
International Journal of High Performance Computing Applications
Parallel Languages and Compilers: Perspective From the Titanium Experience
International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Performance without pain = productivity: data layout and collective communication in UPC
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations
International Journal of High Performance Computing and Networking
Multi-threading and one-sided communication in parallel LU factorization
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Performance portable optimizations for loops containing communication operations
Proceedings of the 22nd annual international conference on Supercomputing
From FORTRAN 77 to locality-aware high productivity languages for peta-scale computing
Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
Servo: a programming model for many-core computing
ACM SIGARCH Computer Architecture News
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Survey on Parallel Programming Model
NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
Scheduling dynamic parallelism on accelerators
Proceedings of the 6th ACM conference on Computing frontiers
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Model-guided autotuning of high-productivity languages for petascale computing
Proceedings of the 18th ACM international symposium on High performance distributed computing
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
A characterization of shared data access patterns in UPC programs
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A performance model for fine-grain accesses in UPC
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Unified parallel C for GPU clusters: language extensions and compiler implementation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Introducing mNUMA: an extended PGAS architecture
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces
Proceedings of the 2012 international symposium on Memory Management
Exploring cross-layer power management for PGAS applications on the SCC platform
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Productivity and Performance of Global-View Programming with XcalableMP PGAS Language
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Elastic computing: A portable optimization framework for hybrid computers
Parallel Computing
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
UPCBLAS: a library for parallel matrix computations in Unified Parallel C
Concurrency and Computation: Practice & Experience
CarFast: achieving higher statement coverage faster
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Towards a complexity model for design and analysis of PGAS-based algorithms
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Designing energy efficient communication runtime systems: a view from PGAS models
The Journal of Supercomputing
Scaling data race detection for partitioned global address space programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
Data deduplication in a hybrid architecture for improving write performance
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Unified Parallel C (UPC) is a parallel language that uses a Single Program Multiple Data (SPMD) model of parallelism within a global address space. The global address space is used to simplify programming, especially on applications with irregular data structures that lead to fine-grained sharing between threads. Recent results have shown that the performance of UPC using a commercial compiler is comparable to that of MPI [7]. In this paper we describe a portable open source compiler for UPC. Our goal is to achieve a similar performance while enabling easy porting of the compiler and runtime, and also provide a framework that allows for extensive optimizations. We identify some of the challenges in compiling UPC and use a combination of micro-benchmarks and application kernels to show that our compiler has low overhead for basic operations on shared data and is competitive, and sometimes faster than, the commercial HP compiler. We also investigate several communication optimizations, and show significant benefits by hand-optimizing the generated code.