Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations

Authors:
Harald Devos;Kristof Beyls;Mark Christiaens;Jan Campenhout;Erik H. D'Hollander;Dirk Stroobandt
Affiliations:
Parallel Information Systems, ELIS-Dept., Faculty of Engineering, Ghent University, Belgium;Parallel Information Systems, ELIS-Dept., Faculty of Engineering, Ghent University, Belgium;Parallel Information Systems, ELIS-Dept., Faculty of Engineering, Ghent University, Belgium;Parallel Information Systems, ELIS-Dept., Faculty of Engineering, Ghent University, Belgium;Parallel Information Systems, ELIS-Dept., Faculty of Engineering, Ghent University, Belgium;Parallel Information Systems, ELIS-Dept., Faculty of Engineering, Ghent University, Belgium
Venue:
Transactions on High-Performance Embedded Architectures and Compilers I
Year:
2007

Citing 24
Cited 5

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Experiences with data dependence abstractions

ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping a Single Assignment Programming Language to Reconfigurable Systems

The Journal of Supercomputing
Sequential Logic Synthesis

Sequential Logic Synthesis
Tuning Memory Performance of Sequential and Parallel Programs

Computer
The Density Advantage of Configurable Computing

Computer
Embedded Computer Architecture and Automation

Computer
Saving Power by Synthesizing Gated Clocks for Sequential Circuits

IEEE Design & Test
Iterative Compilation

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Translating affine nested-loop programs to process networks

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Expression Synthesis in Process Networks generated by LAURA

ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Intermediately executed code is the key to find refactorings that improve temporal data locality

Proceedings of the 3rd conference on Computing frontiers
Improving data locality by chunking

CC'03 Proceedings of the 12th international conference on Compiler construction
Discovery of locality-improving refactorings by reuse path analysis

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform

IEEE Transactions on Signal Processing
Platform-based design from parallel C specifications

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Evaluation of design alternatives for the 2-D-discrete wavelet transform

IEEE Transactions on Circuits and Systems for Video Technology

Constructing application-specific memory hierarchies on FPGAs

Transactions on high-performance embedded architectures and compilers III
The energy scalability of wavelet-based, scalable video decoding

PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the Conference on Design, Automation and Test in Europe
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

When implementing multimedia applications, solutions in dedicated hardware are chosen only when the required performance or energy-efficiency cannot be met with a software solution. The performance of a hardware design critically depends upon having high levels of parallelism and data locality. Often a long sequence of high-level transformations is needed to sufficiently increase the locality and parallelism. The effect of the transformations is known only after translating the high-level code into a specific design at the circuit level. When the constraints are not met, hardware designers need to redo the high-level loop transformations, and repeat all subsequent translation steps, which leads to long design times.We propose a method to reduce design time through the synergistic combination of techniques (a) to quickly pinpoint the loop transformations that increase locality; (b) to refactor loops in a polyhedral model and check whether a sequence of refactorings is legal; (c) to generate efficient structural VHDL from the optimized refactored algorithm.The implementation of these techniques in a tool suite results in a far shorter design time of hours instead of days or weeks. A 2D-inverse discrete wavelet transform was taken as a case study. The results outperform those of a commercial C-to-VHDL compiler, and compare favorably with existing published approaches.