Send-receive considered harmful: Myths and realities of message passing
ACM Transactions on Programming Languages and Systems (TOPLAS)
International Journal of High Performance Computing Applications
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Parallel Medical Image Reconstruction: From Graphics Processors to Grids
PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Computer Methods and Programs in Biomedicine
Parallel medical image reconstruction: from graphics processing units (GPU) to Grids
The Journal of Supercomputing
Hi-index | 0.00 |
This paper presents experiences and results obtained in optimizing the parallel communication performance of a production-quality medical image reconstruction application. The fundamental communication operations in the application's principal algorithm are collective reductions. The overhead of these operations was reduced by transforming the algorithm to overlap its computation and communication. Several different approaches to communication progress were studied, both user-directed and asynchronous. Experimental results comparing the new approach to the previous implementation show overall application performance improvements of up to 8%, when run on 32 nodes.