An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
ZPL: A Machine Independent Programming Language for Parallel Computers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Compiler optimizations for scalable parallel systems
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions
IEEE Transactions on Parallel and Distributed Systems
Detecting and Using Affinity in an Automatic Data Distribution Tool
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Parallelization of Benchmarks for Scalable Shared-Memory Multiprocessors
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A Parallel Numerical Library for UPC
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Hi-index | 0.00 |
Partitioned Global Address Space (PGAS) languages offer an attractive, high-productivity programming model for programming large-scale parallel machines. PGAS languages, such as Unified Parallel C (UPC), combine the simplicity of shared-memory programming with the efficiency of the message-passing paradigm by allowing users control over the data layout. PGAS languages distinguish between private, shared-local, and shared-remote memory, with shared-remote accesses typically much more expensive than shared-local and private accesses, especially on distributed memory machines where shared-remote access implies communication over a network.In this paper we present a simple extension to the UPC language that allows the programmer to block shared arrays in multiple dimensions. We claim that this extension allows for better control of locality, and therefore performance, in the language.We describe an analysis that allows the compiler to distinguish between local shared array accesses and remote shared array accesses. Local shared array accesses are then transformed into direct memory accesses by the compiler, saving the overhead of a locality check at runtime. We present results to show that locality analysis is able to significantly reduce the number of shared accesses.