Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

  • Authors:
  • Mohammadsadegh Sadri;Christian Weis;Norbert Wehn;Luca Benini

  • Affiliations:
  • University of Bologna, Italy;University of Kaiserslautern, Germany;University of Kaiserslautern, Germany;University of Bologna, Italy

  • Venue:
  • Proceedings of the 10th FPGAworld Conference
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as target and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1.6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1.2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2.5 nJ ( 20%) is shown for each byte of processed data. This is the first work which represents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.