Exploring the multiple-GPU design space

Authors:
Dana Schaa;David Kaeli
Affiliations:
Department of Electrical and Computer Engineering, Northeastern University, USA;Department of Electrical and Computer Engineering, Northeastern University, USA
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 10

An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Profile-guided optimization of critical medical imaging algorithms

ISBI'09 Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro
Non-negative matrix factorization implementation using graphic processing units

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Scalable framework for mapping streaming applications onto multi-GPU systems

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Tuning solution of large non-Hermitian linear systems on multiple graphics processing unit accelerated workstations

International Journal of High Performance Computing Applications
PARTANS: An autotuning framework for stencil computation on multi-GPU systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
GPU optimization of convolution for large 3-d real images

ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems
Parallel implementation of a X-ray tomography reconstruction algorithm based on MPI and CUDA

Proceedings of the 20th European MPI Users' Group Meeting
Efficient co-processor utilization in database query processing

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPUs) have been growing in popularity due to their impressive processing capabilities, and with general purpose programming languages such as NVIDIA's CUDA interface, are becoming the platform of choice in the scientific computing community. Previous studies that used GPUs focused on obtaining significant performance gains from execution on a single GPU. These studies employed low-level, architecture-specific tuning in order to achieve sizeable benefits over multicore CPU execution. In this paper, we consider the benefits of running on multiple (parallel) GPUs to provide further orders of performance speedup. Our methodology allows developers to accurately predict execution time for GPU applications while varying the number and configuration of the GPUs, and the size of the input data set. This is a natural next step in GPU computing because it allows researchers to determine the most appropriate GPU configuration for an application without having to purchase hardware, or write the code for a multiple-GPU implementation. When used to predict performance on six scientific applications, our framework produces accurate performance estimates (11% difference on average and 40% maximum difference in a single case) for a range of short and long running scientific programs.