Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Authors:
Christophe Alias;Alain Darte;Alexandru Plesco
Affiliations:
Compsys, LIP, UMR 5668 CNRS, INRIA, ENS-Lyon, UCB-Lyon, Lyon, France;Compsys, LIP, UMR 5668 CNRS, INRIA, ENS-Lyon, UCB-Lyon, Lyon, France;Compsys, LIP, UMR 5668 CNRS, INRIA, ENS-Lyon, UCB-Lyon, Lyon, France
Venue:
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Year:
2012

Citing 12
Cited 1

Memory size reduction through storage order optimization for embedded parallel multimedia applications

Parallel Computing - Special issue on applications: parallel processing and multimedia
Loop tiling for parallelism

Loop tiling for parallelism
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Effective communication coalescing for data-parallel applications

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Lattice-Based Memory Allocation

IEEE Transactions on Computers
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
DRDU: A data reuse analysis technique for efficient scratch-pad memory management

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Bee+Cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
High-Level Synthesis: from Algorithm to Digital Circuit

High-Level Synthesis: from Algorithm to Digital Circuit
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
A reuse-aware prefetching scheme for scratchpad memory

Proceedings of the 48th Design Automation Conference

Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of the high-level synthesis (HLS) of regular kernels offloaded to FPGA and communicating with an external DDR memory, we show how to automatically generate adequate communicating processes for optimizing the transfer of remote data. This requires a generalized form of communication coalescing where data can be transferred from the external memory even when this memory is not fully up-to-date. Experiments with Altera HLS tools demonstrate that this automatization, based on advanced polyhedral code analysis and code generation techniques, can be used to efficiently map C kernels to FPGA, by generating, entirely at C level, all the necessary glue (the communication processes), which is compiled with the same HLS tool as for the computation kernel.