Optimising data movement rates for parallel processing applications on graphics processors

Authors:
Owen Harrison;John Waldron
Affiliations:
Computer Architecture Group, Trinity College Dublin, Dublin, Ireland;Computer Architecture Group, Trinity College Dublin, Dublin, Ireland
Venue:
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Year:
2007

Citing 4
Cited 4

Fast and approximate stream mining of quantiles and frequencies using graphics processors

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
OpenVIDIA: parallel GPU computer vision

Proceedings of the 13th annual ACM international conference on Multimedia
GPUTeraSort: high performance graphics co-processor sorting for large database management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
OpenGL(R) Programming Guide: The Official Guide to Learning OpenGL(R), Version 2 (5th Edition) (OpenGL)

OpenGL(R) Programming Guide: The Official Guide to Learning OpenGL(R), Version 2 (5th Edition) (OpenGL)

AES Encryption Implementation and Analysis on Commodity Graphics Processing Units

CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
Using GPUs to improve multigrid solver performance on a cluster

International Journal of Computational Science and Engineering
Analysis of Parallel Algorithms for Energy Conservation with GPU

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Energy cost evaluation of parallel algorithms for multiprocessor systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54GB/s memory bandwidth spread across 48 ALUs it is easy to see why. The GPU architecture is particularly suited to the parallel stream processing paradigm of low levels of data dependency, high data to instruction ratio and predictable memory access patterns. One largely ignored, yet key, bottleneck for this type of processing on GPUs is both download and readback transfer performance to and from the graphics card. Existing tools provide great developer assistance in many areas of GPU application development, though provide very limited assistance in gaining the best bi-directional data transfer performance. In this paper, we discuss these limitations and present new investigative tools which allow general purpose processing GPU developers to explore the complex array of configuration states which affect both the download and readback performance.