Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Scalar replacement in the presence of conditional control flow
Software—Practice & Experience
A strategy for array management in local memory
Mathematical Programming: Series A and B
Low energy memory and register allocation using network flow
DAC '97 Proceedings of the 34th annual Design Automation Conference
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
A general algorithm for tiling the register level
ICS '98 Proceedings of the 12th international conference on Supercomputing
On the complexity of loop fusion
Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Tiling imperfectly-nested loop nests
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Loop fusion for memory space optimization
Proceedings of the 14th international symposium on Systems synthesis
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
International Journal of Parallel Programming
Optimizing inter-nest data locality
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Tiling and Memory Reuse for Sequences of Nested Loops
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
I/O-Conscious Tiling for Disk-Resident Data Sets
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Compiler-Based Approach for Improving Intra-Iteration Data Reuse
Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing the memory bandwidth with loop fusion
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Data space-oriented tiling for enhancing locality
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
In today's embedded systems, memory hierarchy is rapidly becoming a major factor in terms of power, performance and area. This is especially true for embedded multimedia applications using temporary multi-dimensional arrays that are typically used to store intermediate results during multimedia processing. In this paper, we propose a new technique that optimizes the use of the cache and the registers. It consists in combining buffer and register allocation to reduce the size of the temporary arrays. Firstly we use the concept of live data to replace each array by a buffer of lower size. Then we replace references to these buffers by registers. The buffer allocation step keeps only useful data in memory and the register allocation step allows taking advantage of data reuse in internal loops. Codes considered in this paper are multimedia applications structured as a sequence of loop nests. The experiments are made on Unix environment and on the StepNP simulator (MPSoC platform of STMicroelctronics). They show that our technique yields significant reduction of the number of data cache and TLB misses.