Analyzing program flow within a many-kernel OpenCL application

  • Authors:
  • Perhaad Mistry;Chris Gregg;Norman Rubin;David Kaeli;Kim Hazelwood

  • Affiliations:
  • Northeastern University, Boston, MA;University of Virginia, Charlottesville, VA;Advanced Micro Devices, Boxborough, MA;Northeastern University, Boston, MA;University of Virginia, Charlottesville, VA

  • Venue:
  • Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many developers have begun to realize that heterogeneous multi-core and many-core computer systems can provide significant performance opportunities to a range of applications. Typical applications possess multiple components that can be parallelized; developers need to be equipped with proper performance tools to analyze program flow and identify application bottlenecks. In this paper, we analyze and profile the components of the Speeded Up Robust Features (SURF) Computer Vision algorithm written in OpenCL. Our profiling framework is developed using built-in OpenCL API function calls, without the need for an external profiler. We show we can begin to identify performance bottlenecks and performance issues present in individual components on different hardware platforms. We demonstrate that by using run-time profiling using the OpenCL specification, we can provide an application developer with a fine-grained look at performance, and that this information can be used to tailor performance improvements for specific platforms.