Separable 2d convolution with polymorphic register files

  • Authors:
  • Cătălin B. Ciobanu;Georgi N. Gaydadjiev

  • Affiliations:
  • Computer Engineering Laboratory, EEMCS, Delft University of Technology, The Netherlands,Department of Computer Science and Engineering, Chalmers University of Technology, Sweden;Computer Engineering Laboratory, EEMCS, Delft University of Technology, The Netherlands,Department of Computer Science and Engineering, Chalmers University of Technology, Sweden

  • Venue:
  • ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Files (PRFs). We present a matrix transposition algorithm optimized for PRFs, and a 2D vectorized convolution algorithm which avoids strided memory accesses. We compare the throughput of our PRF to the nVidia Tesla C2050 GPU. The results show that even in bandwidth constrained systems, multi-lane PRFs can outperform the GPU for 9 ×9 or larger mask sizes.