Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Experiences with data dependence abstractions
ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping a Single Assignment Programming Language to Reconfigurable Systems
The Journal of Supercomputing
Sequential Logic Synthesis
Saving Power by Synthesizing Gated Clocks for Sequential Circuits
IEEE Design & Test
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Translating affine nested-loop programs to process networks
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
Expression Synthesis in Process Networks generated by LAURA
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Improving data locality by chunking
CC'03 Proceedings of the 12th international conference on Compiler construction
Discovery of locality-improving refactorings by reuse path analysis
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform
IEEE Transactions on Signal Processing
Platform-based design from parallel C specifications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Evaluation of design alternatives for the 2-D-discrete wavelet transform
IEEE Transactions on Circuits and Systems for Video Technology
Constructing application-specific memory hierarchies on FPGAs
Transactions on high-performance embedded architectures and compilers III
The energy scalability of wavelet-based, scalable video decoding
PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the Conference on Design, Automation and Test in Europe
Improving polyhedral code generation for high-level synthesis
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
When implementing multimedia applications, solutions in dedicated hardware are chosen only when the required performance or energy-efficiency cannot be met with a software solution. The performance of a hardware design critically depends upon having high levels of parallelism and data locality. Often a long sequence of high-level transformations is needed to sufficiently increase the locality and parallelism. The effect of the transformations is known only after translating the high-level code into a specific design at the circuit level. When the constraints are not met, hardware designers need to redo the high-level loop transformations, and repeat all subsequent translation steps, which leads to long design times.We propose a method to reduce design time through the synergistic combination of techniques (a) to quickly pinpoint the loop transformations that increase locality; (b) to refactor loops in a polyhedral model and check whether a sequence of refactorings is legal; (c) to generate efficient structural VHDL from the optimized refactored algorithm.The implementation of these techniques in a tool suite results in a far shorter design time of hours instead of days or weeks. A 2D-inverse discrete wavelet transform was taken as a case study. The results outperform those of a commercial C-to-VHDL compiler, and compare favorably with existing published approaches.