Optimising data movement rates for parallel processing applications on graphics processors

  • Authors:
  • Owen Harrison;John Waldron

  • Affiliations:
  • Computer Architecture Group, Trinity College Dublin, Dublin, Ireland;Computer Architecture Group, Trinity College Dublin, Dublin, Ireland

  • Venue:
  • PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54GB/s memory bandwidth spread across 48 ALUs it is easy to see why. The GPU architecture is particularly suited to the parallel stream processing paradigm of low levels of data dependency, high data to instruction ratio and predictable memory access patterns. One largely ignored, yet key, bottleneck for this type of processing on GPUs is both download and readback transfer performance to and from the graphics card. Existing tools provide great developer assistance in many areas of GPU application development, though provide very limited assistance in gaining the best bi-directional data transfer performance. In this paper, we discuss these limitations and present new investigative tools which allow general purpose processing GPU developers to explore the complex array of configuration states which affect both the download and readback performance.