Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Authors:
Jason Cong;Peng Zhang;Yi Zou
Affiliations:
University of California, Los Angeles, CA;University of California, Los Angeles, CA;University of California, Los Angeles, CA
Venue:
Proceedings of the 49th Annual Design Automation Conference
Year:
2012

Citing 28
Cited 4

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Formalized methodology for data reuse exploration for low-power hierarchical memory mappings

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

International Journal of Parallel Programming
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Data Reuse Exploration Techniques for Loop-Dominated Applications

Proceedings of the conference on Design, automation and test in Europe
Optimization within a unified transformation framework

Optimization within a unified transformation framework
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
DRDU: A data reuse analysis technique for efficient scratch-pad memory management

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Incremental hierarchical memory size estimation for steering of loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Trade-offs in loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Combining data reuse with data-level parallelization for FPGA-targeted hardware compilation: a geometric programming framework

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A model for fusion and code motion in an automatic parallelizing compiler

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Customizable Domain-Specific Computing

IEEE Design & Test
A reuse-aware prefetching scheme for scratchpad memory

Proceedings of the 48th Design Automation Conference
Combined loop transformation and hierarchy allocation for data reuse optimization

Proceedings of the International Conference on Computer-Aided Design
Local memory exploration and optimization in embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Data-Reuse-Driven Energy-Aware Cosynthesis of Scratch Pad Memory and Hierarchical Bus-Based Communication Architecture for Multiprocessor Streaming Applications

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
High-Level Synthesis for FPGAs: From Prototyping to Deployment

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Memory partitioning for multidimensional arrays in high-level synthesis

Proceedings of the 50th Annual Design Automation Conference
A method to abstract RTL IP blocks into C++ code and enable high-level synthesis

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

For the majority of computation-intensive application systems, off-chip memory bandwidth is a critical bottleneck for both performance and power consumption. The efficient utilization of limited on-chip memory resources plays a vital role in reducing the off-chip memory accesses. This paper presents an efficient approach for optimizing the on-chip memory allocation by loop transformations in the imperfectly nested loops. We analytically model the on-chip buffer size and off-chip bandwidth after affine loop transformation, loop fusion/distribution and code motion. Branch-and-bound and knapsack reuse techniques are proposed to reduce the computation complexity in finding optimal solutions. Experimental results show that our scheme can save 40% of on-chip memory size with the same bandwidth consumption compared to the previous approaches.