Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Coordinated parallelizing compiler optimizations and high-level synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Lithographic aerial image simulation with FPGA-based hardwareacceleration
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Optimus: efficient realization of streaming applications on FPGAs
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Fpga-based face detection system using Haar classifiers
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
High-performance, energy-efficient platforms using in-socket FPGA accelerators
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Accelerating Compute-Intensive Applications with GPUs and FPGAs
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
A novel SoC architecture on FPGA for ultra fast face detection
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Synthesis of Platform Architectures from OpenCL Programs
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
High-Level Synthesis for FPGAs: From Prototyping to Deployment
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse- and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SIMT (Single Instruction, Multiple Thread) CUDA code into task-level parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multicore accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.