Polyhedral-based data reuse optimization for configurable computing

Authors:
Louis-Noel Pouchet;Peng Zhang;P. Sadayappan;Jason Cong
Affiliations:
University of California Los Angeles, Los Angeles, CA, USA;University of California Los Angeles, Los Angeles, CA, USA;Ohio State University, Columbus, OH, USA;University of California Los Angeles, Los Angeles, CA, USA
Venue:
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Year:
2013

Citing 27
Cited 2

Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Tiling imperfectly-nested loop nests

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Lattice-Based Memory Allocation

IEEE Transactions on Computers
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
DRDU: A data reuse analysis technique for efficient scratch-pad memory management

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Incremental hierarchical memory size estimation for steering of loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Trade-offs in loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Polyhedral-Model Guided Loop-Nest Auto-Vectorization

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Combining data reuse with data-level parallelization for FPGA-targeted hardware compilation: a geometric programming framework

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Bridging the gap between compilation and synthesis in the DEFACTO system

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
isl: an integer set library for the polyhedral model

ICMS'10 Proceedings of the Third international congress conference on Mathematical software
Accelerating Fluid Registration Algorithm on Multi-FPGA Platforms

FPL '11 Proceedings of the 2011 21st International Conference on Field Programmable Logic and Applications
Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Domain-specific processor with 3D integration for medical image processing

ASAP '11 Proceedings of the ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
Local memory exploration and optimization in embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
High-Level Synthesis for FPGAs: From Prototyping to Deployment

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching. We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.