Automatic communication coalescing for irregular computations in UPC language

Authors:
Michail Alvanos;Montse Farreras;Ettore Tiotto;Xavier Martorell
Affiliations:
Programming Models, Barcelona Supercomputing Center;Universitat Politècnica de Catalunya;Static Compilation Technology, IBM Canada Software Lab, Canada;Universitat Politècnica de Catalunya
Venue:
CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Year:
2012

Citing 25
Cited 1

Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
A Unified Framework for Optimizing Communication in Data-Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Performance and Experience with LAPI -- A New High-Performance Communication Library for the IBM RS/6000 SP

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
GASNet Specification, v1.1

GASNet Specification, v1.1
A Multi-Platform Co-Array Fortran Compiler

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Automatic Support for Irregular Computations in a High-Level Language

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Effective communication coalescing for data-parallel applications

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
HUNTing the Overlap

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A UPC Runtime System Based on MPI and POSIX Threads

PDP '06 Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The HPC Challenge (HPCC) benchmark suite

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Productivity and performance using partitioned global address space languages

Proceedings of the 2007 international workshop on Parallel symbolic computation
Performance without pain = productivity: data layout and collective communication in UPC

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development
A characterization of shared data access patterns in UPC programs

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Power7: IBM's Next-Generation Server Processor

IEEE Micro
A practical study of UPC using the NAS Parallel Benchmarks

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
The PERCS High-Performance Interconnect

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Asynchronous PGAS runtime for Myrinet networks

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model

Improving communication in PGAS environments: static and dynamic coalescing in UPC

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partitioned Global Address Space (PGAS) languages appeared to address programmer productivity in large scale parallel machines. However, fine grain accesses on shared structures have been identified as one of the main bottlenecks of PGAS languages. Manual or compiler assistance code optimization is required to avoid fine grain accesses. The downside of manually applying code transformations is the increased program complexity and hindering of the programmer productivity. On the other hand, compiler optimizations of fine grain accesses require knowledge of physical data mapping and the use of parallel loop constructs. This paper presents an optimization for prefetching and coalescing of shared accesses at runtime. Larger messages decrease the impact of remote access latency and increase the efficiency of the network communication. We have implemented our optimization for the Unified Parallel C (UPC) language. An experimental evaluation on a distributed-memory environment using a Power7 cluster demonstrates the benefits of our optimization.