Reproducible Measurements of MPI Performance Characteristics
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MPI Reduction Operations for Sparse Floating-point Data
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
CoMPI: Enhancing MPI Based Applications Performance and Scalability Using Run-Time Compression
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scalable communication protocols for dynamic sparse data exchange
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Hi-index | 0.00 |
We describe simple and easy to implement MPI library internal functionality that enables MPI reduction operations to be performed more efficiently with increasing sparsity (fraction of neutral elements for the given operator) of the input (and intermediate result) vectors. Using this functionality we give an implementation of the MPI_Reduce collective operation that completely transparently to the application programmer exploits sparsity of both input and intermediate result vectors. Experiments carried out on a 64-core Intel Nehalem multi-core cluster with InfiniBand interconnect show considerable and worthwhile improvements as the sparsity of the input grows, about a factor of three with 1% non-zero elements which is close to best possible for the approach. The overhead incurred for dense vectors is negligible when compared to the same implementation not exploiting sparsity of input and intermediate results. The implemented SPS_Reduce function is for both very small and large vectors faster than the native MPI_Reduce of the used MPI library, indicating that the improvements reported are not artifacts of suboptimal reduction algorithms.