Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Chip multi-processor generator
Proceedings of the 44th annual Design Automation Conference
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
GLOpenCL: OpenCL support on hardware- and software-managed cache multicores
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
LegUp: high-level synthesis for FPGA-based processor/accelerator systems
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A methodology for efficient use of OpenCL, ESL and FPGAs in multi-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
The problem of automatically generating hardware modules from high level application representations has been at the forefront of EDA research during the last few years. In this paper, we introduce a methodology to automatically synthesize hardware accelerators from OpenCL applications. OpenCL is a recent industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our methodology maps OpenCL kernels into hardware accelerators, based on architectural templates that explicitly decouple computation from memory communication whenever this is possible. The templates can be tuned to provide a wide repertoire of accelerators that meet user performance requirements and FPGA device characteristics. Furthermore, a set of high- and low-level compiler optimizations is applied to generate optimized accelerators. Our experimental evaluation shows that the generated accelerators are tuned efficiently to match the applications memory access pattern and computational complexity, and to achieve user performance requirements. An important objective of our tool is to expand the FPGA development user base to software engineers, thereby expanding the scope of FPGAs beyond the realm of hardware design.