Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
An evaluation of global address space languages: co-array fortran and unified parallel C
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting Patterns in MPI Communication Traces
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Group Operation Assembly Language - A Flexible Way to Express Collective Communication
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Runtime detection and optimization of collective communication patterns
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
The steady increase of parallelism in high-performance computing platforms implies that communication will be most important in large-scale applications. In this work, we tackle the problem of transparent optimization of large-scale communication patterns using online compilation techniques. We utilize the Group Operation Assembly Language (GOAL), an abstract parallel dataflow definition language, to specify our transformations in a device-independent manner. We develop fast schemes that analyze dataflow and synchronization semantics in GOAL and detect if parts of the (or the whole) communication pattern express a known collective communication operation. The detection of collective operations allows us to replace the detected patterns with highly optimized algorithms or low-level hardware calls and thus improve performance significantly. Benchmark results suggest that our technique can lead to a performance improvement of orders of magnitude compared with various optimized algorithms written in Co-Array Fortran. Detecting collective operations also improves the programmability of parallel languages in that the user does not have to understand the detailed semantics of high-level communication operations in order to generate efficient and scalable code.