Incremental hierarchical memory size estimation for steering of loop transformations

Authors:
Q. Hu;P. G. Kjeldsberg;A. Vandecappelle;M. Palkovic;F. Catthoor
Affiliations:
Norwegian University of Science and Technology, Trondheim, Norway;Norwegian University of Science and Technology, Trondheim, Norway;IMEC, Leuven, Belgium;IMEC, Leuven, Belgium;IMEC, Leuven, Belgium
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2007

Citing 31
Cited 6

A framework for unifying reordering transformations

A framework for unifying reordering transformations
Memory estimation for high level synthesis

DAC '94 Proceedings of the 31st annual Design Automation Conference
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Background memory area estimation for multidimensional signal processing systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Affine-by-statement scheduling of uniform and affine loop nests over parametric domains

Journal of Parallel and Distributed Computing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory size estimation for multimedia applications

Proceedings of the 6th international workshop on Hardware/software codesign
Formalized methodology for data reuse exploration for low-power hierarchical memory mappings

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exact memory size estimation for array computations without loop unrolling

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
On the complexity of loop fusion

Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation

IEEE Design & Test
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Loop Alignment for Memory Accesses Optimization

Proceedings of the 12th international symposium on System synthesis
Estimating influence of data layout optimizations on SDRAM energy consumption

Proceedings of the 2003 international symposium on Low power electronics and design
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Data Reuse Exploration Techniques for Loop-Dominated Applications

Proceedings of the conference on Design, automation and test in Europe
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Memory size computation for multimedia processing applications

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
An integrated scratch-pad allocator for affine and non-affine code

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Fast memory footprint estimation based on maximal dependency vector calculation

Proceedings of the conference on Design, automation and test in Europe
Data dependency size estimation for use in memory optimization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Architecture exploration for efficient data transfer and storage in data-parallel applications

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Constructing application-specific memory hierarchies on FPGAs

Transactions on high-performance embedded architectures and compilers III
Combined loop transformation and hierarchy allocation for data reuse optimization

Proceedings of the International Conference on Computer-Aided Design
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip

ACM Transactions on Embedded Computing Systems (TECS)
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.01

Visualization

Abstract

Modern embedded multimedia and telecommunications systems need to store and access huge amounts of data. This becomes a critical factor for the overall energy consumption, area, and performance of the systems. Loop transformations are essential to improve the data access locality and regularity in order to optimally design or utilize a memory hierarchy. However, due to abstract high-level cost functions, current loop transformation steering techniques do not take the memory platform sufficiently into account. They usually also result in only one final transformation solution. On the other hand, the loop transformation search space for real-life applications is huge, especially if the memory platform is still not fully fixed. Use of existing loop transformation techniques will therefore typically lead to suboptimal end-products. It is critical to find all interesting loop transformation instances. This can only be achieved by performing an evaluation of the effect of later design stages at the early loop transformation stage. This article presents a fast incremental hierarchical memory-size requirement estimation technique. It estimates the influence of any given sequence of loop transformation instances on the mapping of application data onto a hierarchical memory platform. As the exact memory platform instantiation is often not yet defined at this high-level design stage, a platform-independent estimation is introduced with a Pareto curve output for each loop transformation instance. Comparison among the Pareto curves helps the designer, or a steering tool, to find all interesting loop transformation instances that might later lead to low-power data mapping for any of the many possible memory hierarchy instances. Initially, the source code is used as input for estimation. However, performing the estimation repeatedly from the source code is too slow for large search space exploration. An incremental approach, based on local updating of the previous result, is therefore used to handle sequences of different loop transformations. Experiments show that the initial approach takes a few seconds, which is two orders of magnitude faster than state-of-the-art solutions but still too costly to be performed interactively many times. The incremental approach typically takes just a few milliseconds, which is another two orders of magnitude faster than the initial approach. This huge speedup allows us for the first time to handle real-life industrial-size applications and get realistic feedback during loop transformation exploration.