An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
International Journal of Parallel Programming
Compiler-generated communication for pipelined FPGA applications
Proceedings of the 40th annual Design Automation Conference
Behavior and communication co-optimization for systems with sequential communication media
Proceedings of the 43rd annual Design Automation Conference
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A Data-Driven Approach for Pipelining Sequences of Data-Dependent Loops
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
High-Level Synthesis: from Algorithm to Digital Circuit
High-Level Synthesis: from Algorithm to Digital Circuit
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
LegUp: high-level synthesis for FPGA-based processor/accelerator systems
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Customizable Domain-Specific Computing
IEEE Design & Test
Multilevel Granularity Parallelism Synthesis on FPGAs
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Accelerating Fluid Registration Algorithm on Multi-FPGA Platforms
FPL '11 Proceedings of the 2011 21st International Conference on Field Programmable Logic and Applications
Optimizing SDRAM bandwidth for custom FPGA loop accelerators
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
High-level synthesis: productivity, performance, and software constraints
Journal of Electrical and Computer Engineering - Special issue on ESL Design Methodology
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
High-Level Synthesis for FPGAs: From Prototyping to Deployment
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Memory partitioning and scheduling co-optimization in behavioral synthesis
Proceedings of the International Conference on Computer-Aided Design
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Improving polyhedral code generation for high-level synthesis
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
High level synthesis (HLS) is an important enabling technology for the adoption of hardware accelerator technologies. It promises the performance and energy efficiency of hardware designs with a lower barrier to entry in design expertise, and shorter design time. State-of-the-art high level synthesis now includes a wide variety of powerful optimizations that implement efficient hardware. These optimizations can implement some of the most important features generally performed in manual designs including parallel hardware units, pipelining of execution both within a hardware unit and between units, and fine-grained data communication. We may generally classify the optimizations as those that optimize hardware implementation within a code block (intra-block) and those that optimize communication and pipelining between code blocks (inter-block). However, both optimizations are in practice difficult to apply. Real-world applications contain data-dependent blocks of code and communicate through complex data access patterns. Existing high level synthesis tools cannot apply these powerful optimizations unless the code is inherently compatible, severely limiting the optimization opportunity. In this paper we present an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. Our polyhedral model-based technique systematically defines a set of data access patterns, identifies effective data access patterns, and performs the loop transformations to enable the intra- and inter-block optimizations. Our framework automatically explores transformation options, performs code transformations, and inserts the appropriate HLS directives to implement the HLS optimizations. Furthermore, our framework can automatically generate the optimized communication blocks for fine-grained communication between hardware blocks. Experimental evaluation demonstrates that we can achieve an average of 6.04X speedup over the high level synthesis solution without our transformations to enable intra- and inter-block optimizations.