High-Level Synthesis for FPGAs: From Prototyping to Deployment

Authors:
J. Cong;Bin Liu;S. Neuendorffer;J. Noguera;K. Vissers;Zhiru Zhang
Affiliations:
AutoESL Design Technol., Inc., Los Angeles, CA, USA;-;-;-;-;-
Venue:
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year:
2011

Citing 0
Cited 33

Combined loop transformation and hierarchy allocation for data reuse optimization

Proceedings of the International Conference on Computer-Aided Design
Impact of FPGA architecture on resource sharing in high-level synthesis

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
FPGA-accelerated 3D reconstruction using compressive sensing

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Towards layout-friendly high-level synthesis

Proceedings of the 2012 ACM international symposium on International Symposium on Physical Design
Architecture support for accelerator-rich CMPs

Proceedings of the 49th Annual Design Automation Conference
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Communication and memory architecture design of application-specific high-end multiprocessors

VLSI Design
Scalable communication architectures for massively parallel hardware multi-processors

Journal of Parallel and Distributed Computing
Finite state machine optimization in FPGAs

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Memory partitioning and scheduling co-optimization in behavioral synthesis

Proceedings of the International Conference on Computer-Aided Design
Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Multi-pumping for resource reduction in FPGA high-level synthesis

Proceedings of the Conference on Design, Automation and Test in Europe
Utilizing voltage-frequency islands in C-to-RTL synthesis for streaming applications

Proceedings of the Conference on Design, Automation and Test in Europe
Handling design and implementation optimizations in equivalence checking for behavioral synthesis

Proceedings of the 50th Annual Design Automation Conference
Combining module selection and replication for throughput-driven streaming programs

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Efficient compilation of CUDA kernels for high-performance computing on FPGAs

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Theory and algorithm for generalized memory partitioning in high-level synthesis

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Bound-oriented parallel pruning approaches for efficient resource constrained scheduling of high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
SDC-based modulo scheduling for pipeline synthesis

Proceedings of the International Conference on Computer-Aided Design
FPGA simulation engine for customized construction of neural microcircuits

Proceedings of the International Conference on Computer-Aided Design
Optimization of interconnects between accelerators and shared memories in dark silicon

Proceedings of the International Conference on Computer-Aided Design
Design of massively parallel hardware multi-processors for highly-demanding embedded applications

Microprocessors & Microsystems
An impulse-c hardware accelerator for packet classification based on fine/coarse grain optimization

International Journal of Reconfigurable Computing
Rainbow: an operating system for software-hardware multitasking on dynamically partially reconfigurable FPGAs

International Journal of Reconfigurable Computing
A heterogeneous computing system for coupling 3D endomicroscopy with volume rendering in real-time image visualization

Computers in Industry
Software-programmable digital pre-distortion on new generation FPGAs

Analog Integrated Circuits and Signal Processing
Hardware---software optimizations of reconfigurable multi-core processors for floating-point computations of large sparse matrices

Journal of Real-Time Image Processing
A practical evaluation of the performance of the Impulse CoDeveloper HLS tool for implementing large-kernel 2-D filters

Journal of Real-Time Image Processing
Processor architecture exploration and synthesis of massively parallel multi-processor accelerators in application to LDPC decoding

Microprocessors & Microsystems

Quantified Score

Hi-index	0.03

Visualization

Abstract

Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.