Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs

Authors:
Yash Ukidave;Xiang Gong;David Kaeli
Affiliations:
Electrical and Computer Engineering, Northeastern University Boston, MA;Electrical and Computer Engineering, Northeastern University Boston, MA;Electrical and Computer Engineering, Northeastern University Boston, MA
Venue:
Proceedings of Workshop on General Purpose Processing Using GPUs
Year:
2014

Citing 14
Cited 0

OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2

OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2
CULLIDE: interactive collision detection between complex models in large environments using graphics hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems)

GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems)
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads

IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

IEEE Transactions on Parallel and Distributed Systems
OpenCL Programming Guide

OpenCL Programming Guide
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

Parallel Computing
Faster GPS via the sparse fourier transform

Proceedings of the 18th annual international conference on Mobile computing and networking
Multi2Sim: a simulation framework for CPU-GPU computing

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Combining GPU data-parallel computing with OpenGL

ACM SIGGRAPH 2013 Courses
Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms

IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Interactive particle dynamics using OpenCL and Kinect

International Journal of Parallel, Emergent and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPUs) have gained recognition as the primary form of accelerators for graphics rendering in the gaming domain. They have also been widely accepted as the computing platform of choice in many scientific and high performance computing domains. The parallelism offered by the GPUs is used for simultaneous processing of compute and graphics by applications belonging to a range of domains. The availability of programming standards such as OpenCL and OpenGL has been leveraged to achieve the compute-graphics interoperability in the same application. However, given the increasing demands in both compute and graphics for emerging scientific visualization and immersive gaming applications, degradation in efficiency can be seen due to the continual switching between compute/graphics, swapping in and out of their associated runtime environments. We need to better understand how to tune this interoperable environment in order to allow compute and graphics to run both efficiently and simultaneously. Presently we evaluate each of these domains in isolation. In this paper, we evaluate the performance and efficiency of the OpenCL-OpenGL(CL-GL) interoperability(interop) mode. We explore different methods to improve the execution performance of the CL-GL interop-based applications. We propose a slot-based rendering mechanism for CL-GL interop to increase the efficiency of the application. To evaluate CL-GL and our slot-based scheme, we study five scientific applications using OpenCL and OpenGL for compute and graphics rendering. Our study covers two AMD Radeon discrete GPUs and one shared memory AMD APU as test platforms. We demonstrate that leveraging the CL-GL interop interface results in a 2.2X performance increase, and our slot-based rendering provides 60% increase in performance by providing a 24% improvement in L2 cache hit rate on GPUs and APUs.