Throughput-oriented kernel porting onto FPGAs

Authors:
Alexandros Papakonstantinou;Deming Chen;Wen-Mei Hwu;Jason Cong;Yun Liang
Affiliations:
University of Illinois, Urbana-Champaign, IL;University of Illinois, Urbana-Champaign, IL;University of Illinois, Urbana-Champaign, IL;University of California, Los Angeles, California;Peking University, Beijing, China
Venue:
Proceedings of the 50th Annual Design Automation Conference
Year:
2013

Citing 12
Cited 1

Improvements to graph coloring register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Extracting task-level parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced compiler design and implementation

Advanced compiler design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
The range test: a dependence test for symbolic, non-linear expressions

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Register allocation and spilling via graph coloring

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Coordinated parallelizing compiler optimizations and high-level synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cetus: A Source-to-Source Compiler Infrastructure for Multicores

Computer
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Synthesis of Platform Architectures from OpenCL Programs

FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Multilevel Granularity Parallelism Synthesis on FPGAs

FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques

OmpSs@Zynq all-programmable SoC ecosystem

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FPGAs.