Trade-offs in loop transformations

Authors:
Martin Palkovic;Francky Catthoor;Henk Corporaal
Affiliations:
IMEC, Leuven, Belgium;IMEC, Leuven, Belgium;Technische Universiteit Eindhoven, AZ Eindhoven, The Netherlands
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2009

Citing 30
Cited 5

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Optimization of array accesses by collective loop transformations

ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The Alpha du Centaur experiment

Proceedings of the international workshop on Algorithms and parallel VLSI architectures II
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Affine-by-statement scheduling of uniform and affine loop nests over parametric domains

Journal of Parallel and Distributed Computing
Memory size reduction through storage order optimization for embedded parallel multimedia applications

Parallel Computing - Special issue on applications: parallel processing and multimedia
SpC: synthesis of pointers in C: application of pointer analysis to the behavioral synthesis from C

Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Parametric Analysis of Polyhedral Iteration Spaces

Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
A preprocessing step for global loop transformations for data transfer optimization

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
A unified framework for schedule and storage optimization

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs

IEEE Design & Test
A Layout-Conscious Iteration Space Transformation Technique

IEEE Transactions on Computers
Array recovery and high-level transformations for DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
A Singular Loop Transformation Framework Based on Non-Singular Matrices

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Advanced copy propagation for arrays

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Automatic data mapping of signal processing applications

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Loop Alignment for Memory Accesses Optimization

Proceedings of the 12th international symposium on System synthesis
Data Reuse Exploration Techniques for Loop-Dominated Applications

Proceedings of the conference on Design, automation and test in Europe
Control Flow Driven Splitting of Loop Nests at the Source Code Level

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Flexible and Formal Modeling of Microprocessors with Application to Retargetable Simulation

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Design of a low power pre-synchronization ASIP for multimode SDR terminals

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation

Experience with Widening Based Equivalence Checking in Realistic Multimedia Systems

Journal of Electronic Testing: Theory and Applications
Combined loop transformation and hierarchy allocation for data reuse optimization

Proceedings of the International Conference on Computer-Aided Design
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
Improving last level cache locality by integrating loop and data transformations

Proceedings of the International Conference on Computer-Aided Design
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, multimedia systems deal with huge amounts of memory accesses and large memory footprints. To alleviate the impact of these accesses and reduce the memory footprint, high-level memory exploration and optimization techniques have been proposed. These techniques try to more efficiently utilize the memory hierarchy. An important step in these optimization techniques are loop transformations (LT). They have a crucial effect on later data memory footprint optimization steps and code generation. However, the state-of-the-art work has focused only on individual objectives. The main one in literature involves improving the locality of data accesses, and thus reducing the data memory footprint. It does not consider the trade-offs in the LT step in relation to successive optimization steps. Therefore, it is not globally efficient in mapping the application on the target platform. In this article we will discuss several trade-offs during the loop transformations. To our knowledge, we are the first ones considering these global trade-offs. Previous work always gave mostly one solution, having the best locality and thus the optimized memory footprint, even though some research in two-dimensional trade-offs in this area exists as well. We start from this state-of-the-art solution with minimal footprint. We show that by sacrificing the footprint, we can obtain gains in data reuse (crucial for energy reduction) and reduce the control-flow complexity. We demonstrate our approach on a real-life application, namely the QSDPCM video coder. At the end, we show that considering trade-offs for this application leads to 16% energy reduction in a two-layer memory subsystem and 10% cycle reduction on the ARM platform.