Analyzing program flow within a many-kernel OpenCL application

Authors:
Perhaad Mistry;Chris Gregg;Norman Rubin;David Kaeli;Kim Hazelwood
Affiliations:
Northeastern University, Boston, MA;University of Virginia, Charlottesville, VA;Advanced Micro Devices, Boxborough, MA;Northeastern University, Boston, MA;University of Virginia, Charlottesville, VA
Venue:
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Year:
2011

Citing 12
Cited 3

Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Automatic Panoramic Image Stitching using Invariant Features

International Journal of Computer Vision
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Predictive Runtime Code Scheduling for Heterogeneous Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
GPU acceleration of a production molecular docking code

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
SD-VBS: The San Diego Vision Benchmark Suite

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Performance characterization and optimization of mobile augmented reality on handheld platforms

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
An experimental approach to performance measurement of heterogeneous parallel applications using CUDA

Proceedings of the 24th ACM International Conference on Supercomputing
ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs

ACM SIGGRAPH 2010 Posters
Accelerating S3D: a GPGPU case study

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

Enabling task-level scheduling on heterogeneous platforms

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many developers have begun to realize that heterogeneous multi-core and many-core computer systems can provide significant performance opportunities to a range of applications. Typical applications possess multiple components that can be parallelized; developers need to be equipped with proper performance tools to analyze program flow and identify application bottlenecks. In this paper, we analyze and profile the components of the Speeded Up Robust Features (SURF) Computer Vision algorithm written in OpenCL. Our profiling framework is developed using built-in OpenCL API function calls, without the need for an external profiler. We show we can begin to identify performance bottlenecks and performance issues present in individual components on different hardware platforms. We demonstrate that by using run-time profiling using the OpenCL specification, we can provide an application developer with a fine-grained look at performance, and that this information can be used to tailor performance improvements for specific platforms.