Supporting Cache Coherence in Heterogeneous Multiprocessor Systems
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Interrupt Costs in Embedded System with Short Latency Hardware Accelerators
ECBS '08 Proceedings of the 15th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems
Multi-Engine Packet Classification Hardware Accelerator
ICCCN '09 Proceedings of the 2009 Proceedings of 18th International Conference on Computer Communications and Networks
A taxonomy of accelerator architectures and their programming models
IBM Journal of Research and Development
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms
Proceedings of the 48th Design Automation Conference
FCCM '12 Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines
Hardware Accelerator for BLAST
MCSOC '12 Proceedings of the 2012 IEEE 6th International Symposium on Embedded Multicore SoCs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as target and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1.6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1.2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2.5 nJ ( 20%) is shown for each byte of processed data. This is the first work which represents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.