Parallelization Approaches for Hardware Accelerators --- Loop Unrolling Versus Loop Partitioning

  • Authors:
  • Frank Hannig;Hritam Dutta;Jürgen Teich

  • Affiliations:
  • Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany;Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany;Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany

  • Venue:
  • ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

State-of-the-art behavioral synthesis tools barely have high-level transformations in order to achieve highly parallelized implementations. If any, they apply loop unrolling to obtain a higher throughput. In this paper, we employ the PARO behavioral synthesis tool which has the unique ability to perform both loop unrolling or loop partitioning. Loop unrolling replicates the loop kernel and exposes the parallelism for hardware implementation, whereas partitioning tiles the loop program onto a regular array consisting of tightly coupled processing elements. The usage of the same design tool for both the variants enables for the first time, a quantitative evaluation of the two approaches for reconfigurable architectures with help of computationally intensive algorithms selected from different benchmarks. Superlinear speedups in terms of throughput are accomplished for the processor array approach. In addition, area and power cost are reduced.