LogGOPSim: simulating large-scale applications in the LogGOPS model
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Kernel-based offload of collective operations: implementation, evaluation and lessons learned
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Communication-centric optimizations by dynamically detecting collective operations
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Optimization principles for collective neighborhood communications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Pattern-independent detection of manual collectives in MPI programs
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
The implementation and optimization of collective communication operations is an important field of active research. Such operations directly influence application performance and need to map the communication requirements in an optimal way to steadily changing network architectures. In this work, we define an abstract domain-specific language to express arbitrary group communication operations. We show the universality of this language and how all existing collective operations can be implemented with it. By design, it readily lends itself to blocking and nonblocking execution, as well as to off-loaded execution of complex group communication operations. We also define several offline and online optimizations (compiler transformations and scheduling decisions, respectively) to improve the overall performance of the operation. Performance results show that the overhead to express current collective operations is negligible in comparison to the potential gains in a highly optimized implementation.