Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
OpenVIDIA: parallel GPU computer vision
Proceedings of the 13th annual ACM international conference on Multimedia
GPUTeraSort: high performance graphics co-processor sorting for large database management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
OpenGL(R) Programming Guide: The Official Guide to Learning OpenGL(R), Version 2 (5th Edition) (OpenGL)
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units
CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
Analysis of Parallel Algorithms for Energy Conservation with GPU
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Energy cost evaluation of parallel algorithms for multiprocessor systems
Cluster Computing
Hi-index | 0.00 |
Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54GB/s memory bandwidth spread across 48 ALUs it is easy to see why. The GPU architecture is particularly suited to the parallel stream processing paradigm of low levels of data dependency, high data to instruction ratio and predictable memory access patterns. One largely ignored, yet key, bottleneck for this type of processing on GPUs is both download and readback transfer performance to and from the graphics card. Existing tools provide great developer assistance in many areas of GPU application development, though provide very limited assistance in gaining the best bi-directional data transfer performance. In this paper, we discuss these limitations and present new investigative tools which allow general purpose processing GPU developers to explore the complex array of configuration states which affect both the download and readback performance.