Summed-area table algorithm optimization based on the OpenCL

  • Authors:
  • Shengen Yan;Yunquan Zhang;Guoping Long

  • Affiliations:
  • Institute of Software, the Chinese, Academy of Sciences, Beijing, China;Institute of Software, the Chinese, Academy of Sciences, Beijing, China;Institute of Software, the Chinese, Academy of Sciences, Beijing, China

  • Venue:
  • Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Summed-Area table algorithm is also known as image integral algorithm. It is often used for quickly and efficiently generating the sum of values in a rectangular subset of a grid. Our work is based on the OpenCL framework. We have studied various kinds of optimization methods mainly on AMD GPUs. In this paper, we first implemented an efficient prefix sum algorithm. Then we described how to use vectors in detail. We also adopted many other skills. For instance, a workgroup calculates the entire column by using a loop and each workgroup calculates multi-columns. The results show that the optimized algorithm got a good performance on both NVIDIA platform and AMD platform. On the NVIDIA Tesla C2050 GPU, we got a 33% performance boost compared to CUDA NPP. On the AMD HD 5850 platform, the average performance has reached 4.21 times compared to the appropriate CPU version function in OpenCV 2.3.