The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Quantifying NUMA and contention effects in multi-GPU systems
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
Computer Science - Research and Development
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Efficient Intranode Communication in GPU-Accelerated Systems
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Hi-index | 0.00 |
General-Purpose Graphics Processing Units (GPGPUs) are becoming a common component of modern supercomputing systems. Many MPI applications are being modified to take advantage of the superior compute potential offered by GPUs. To facilitate this process, many MPI libraries are being extended to support MPI communication from GPU device memory. However, there is lack of a standardized benchmark suite that helps users evaluate common communication models on GPU clusters and do a fair comparison for different MPI libraries. In this paper, we extend the widely used OSU Micro-Benchmarks (OMB) suite with benchmarks that evaluate performance of point-point, multi-pair and collective MPI communication for different GPU cluster configurations. Benefits of the proposed benchmarks for MVAPICH2 and OpenMPI libraries are illustrated.