Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

Authors:
Mohammadsadegh Sadri;Christian Weis;Norbert Wehn;Luca Benini
Affiliations:
University of Bologna, Italy;University of Kaiserslautern, Germany;University of Kaiserslautern, Germany;University of Bologna, Italy
Venue:
Proceedings of the 10th FPGAworld Conference
Year:
2013

Citing 9
Cited 0

Supporting Cache Coherence in Heterogeneous Multiprocessor Systems

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Interrupt Costs in Embedded System with Short Latency Hardware Accelerators

ECBS '08 Proceedings of the 15th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems
Maintaining I/O Data Coherence in Embedded Multicore Systems

IEEE Micro
Multi-Engine Packet Classification Hardware Accelerator

ICCCN '09 Proceedings of the 2009 Proceedings of 18th International Conference on Computer Communications and Networks
A taxonomy of accelerator architectures and their programming models

IBM Journal of Research and Development
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

Proceedings of the 48th Design Automation Conference
Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems

FCCM '12 Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines
Hardware Accelerator for BLAST

MCSOC '12 Proceedings of the 2012 IEEE 6th International Symposium on Embedded Multicore SoCs
P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as target and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1.6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1.2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2.5 nJ ( 20%) is shown for each byte of processed data. This is the first work which represents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.