A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Formalized methodology for data reuse exploration for low-power hierarchical memory mappings
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
International Journal of Parallel Programming
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Data Reuse Exploration Techniques for Loop-Dominated Applications
Proceedings of the conference on Design, automation and test in Europe
Optimization within a unified transformation framework
Optimization within a unified transformation framework
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Trade-offs in loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A model for fusion and code motion in an automatic parallelizing compiler
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Customizable Domain-Specific Computing
IEEE Design & Test
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
Combined loop transformation and hierarchy allocation for data reuse optimization
Proceedings of the International Conference on Computer-Aided Design
Local memory exploration and optimization in embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
High-Level Synthesis for FPGAs: From Prototyping to Deployment
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Improving high level synthesis optimization opportunity through polyhedral transformations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Memory partitioning for multidimensional arrays in high-level synthesis
Proceedings of the 50th Annual Design Automation Conference
A method to abstract RTL IP blocks into C++ code and enable high-level synthesis
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
For the majority of computation-intensive application systems, off-chip memory bandwidth is a critical bottleneck for both performance and power consumption. The efficient utilization of limited on-chip memory resources plays a vital role in reducing the off-chip memory accesses. This paper presents an efficient approach for optimizing the on-chip memory allocation by loop transformations in the imperfectly nested loops. We analytically model the on-chip buffer size and off-chip bandwidth after affine loop transformation, loop fusion/distribution and code motion. Branch-and-bound and knapsack reuse techniques are proposed to reduce the computation complexity in finding optimal solutions. Experimental results show that our scheme can save 40% of on-chip memory size with the same bandwidth consumption compared to the previous approaches.