Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Optimization of array accesses by collective loop transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The Alpha du Centaur experiment
Proceedings of the international workshop on Algorithms and parallel VLSI architectures II
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Affine-by-statement scheduling of uniform and affine loop nests over parametric domains
Journal of Parallel and Distributed Computing
Parallel Computing - Special issue on applications: parallel processing and multimedia
SpC: synthesis of pointers in C: application of pointer analysis to the behavioral synthesis from C
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
A preprocessing step for global loop transformations for data transfer optimization
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs
IEEE Design & Test
A Layout-Conscious Iteration Space Transformation Technique
IEEE Transactions on Computers
Array recovery and high-level transformations for DSP applications
ACM Transactions on Embedded Computing Systems (TECS)
A Singular Loop Transformation Framework Based on Non-Singular Matrices
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Advanced copy propagation for arrays
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Automatic data mapping of signal processing applications
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Loop Alignment for Memory Accesses Optimization
Proceedings of the 12th international symposium on System synthesis
Data Reuse Exploration Techniques for Loop-Dominated Applications
Proceedings of the conference on Design, automation and test in Europe
Control Flow Driven Splitting of Loop Nests at the Source Code Level
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Flexible and Formal Modeling of Microprocessors with Application to Retargetable Simulation
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Design of a low power pre-synchronization ASIP for multimode SDR terminals
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Experience with Widening Based Equivalence Checking in Realistic Multimedia Systems
Journal of Electronic Testing: Theory and Applications
Combined loop transformation and hierarchy allocation for data reuse optimization
Proceedings of the International Conference on Computer-Aided Design
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
Improving last level cache locality by integrating loop and data transformations
Proceedings of the International Conference on Computer-Aided Design
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
Nowadays, multimedia systems deal with huge amounts of memory accesses and large memory footprints. To alleviate the impact of these accesses and reduce the memory footprint, high-level memory exploration and optimization techniques have been proposed. These techniques try to more efficiently utilize the memory hierarchy. An important step in these optimization techniques are loop transformations (LT). They have a crucial effect on later data memory footprint optimization steps and code generation. However, the state-of-the-art work has focused only on individual objectives. The main one in literature involves improving the locality of data accesses, and thus reducing the data memory footprint. It does not consider the trade-offs in the LT step in relation to successive optimization steps. Therefore, it is not globally efficient in mapping the application on the target platform. In this article we will discuss several trade-offs during the loop transformations. To our knowledge, we are the first ones considering these global trade-offs. Previous work always gave mostly one solution, having the best locality and thus the optimized memory footprint, even though some research in two-dimensional trade-offs in this area exists as well. We start from this state-of-the-art solution with minimal footprint. We show that by sacrificing the footprint, we can obtain gains in data reuse (crucial for energy reduction) and reduce the control-flow complexity. We demonstrate our approach on a real-life application, namely the QSDPCM video coder. At the end, we show that considering trade-offs for this application leads to 16% energy reduction in a two-layer memory subsystem and 10% cycle reduction on the ARM platform.