Improvements to graph coloring register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Extracting task-level parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced compiler design and implementation
Advanced compiler design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
The range test: a dependence test for symbolic, non-linear expressions
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Register allocation and spilling via graph coloring
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Coordinated parallelizing compiler optimizations and high-level synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Synthesis of Platform Architectures from OpenCL Programs
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Multilevel Granularity Parallelism Synthesis on FPGAs
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
OmpSs@Zynq all-programmable SoC ecosystem
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FPGAs.