An exploration methodology for a customizable OpenCL stereo-matching application targeted to an industrial multi-cluster architecture

  • Authors:
  • Edoardo Paone;Gianluca Palermo;Vittorio Zaccaria;Cristina Silvano;Diego Melpignano;Germain Haugou;Thierry Lepley

  • Affiliations:
  • Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;STMicroelectronics, Grenoble, France;STMicroelectronics, Grenoble, France;STMicroelectronics, Grenoble, France

  • Venue:
  • Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Open Computing Language (OpenCL) is emerging as a standard for parallel programming of heterogeneous hardware accelerators. With respect to device specific languages, OpenCL enables application portability but does not guarantee performance portability, eventually requiring additional tuning of the implementation to a specific platform or to unpredictable dynamic workloads. In this paper, we present a methodology to analyze the customization space of an OpenCL application in order to improve performance portability and to support dynamic adaptation. We formulate our case study by implementing an OpenCL image stereo-matching application (which computes the relative depth of objects from a pair of stereo images) customized to the STMicroelectronics Platform 2012 many-core computing fabric. In particular, we use design space exploration techniques to generate a set of operating points that represent specific configurations of the parameters allowing different trade-offs between performance and accuracy of the algorithm itself. These points give detailed knowledge about the interaction between the application parameters, the underlying architecture and the performance of the system; they could also be used by a run-time manager software layer to meet dynamic Quality-of-Service (QoS) constraints. To analyze the customization space, we use cycle-accurate simulations for the target architecture. Since the profiling phase of each configuration takes a long simulation time, we designed our methodology to reduce the overall number of simulations by exploiting some important features of the application parameters; our analysis also enables the identification of the parameters that could be explored on a high-level simulation model to reduce the simulation time. The resulting methodology is one order of magnitude more efficient than an exhaustive exploration and, given its randomized nature, it increases the probability to avoid sub-optimal trade-offs.