Buffer and Register Allocation for Memory Space Optimization

Authors:
Youcef Bouchebaba;Bruno Girodias;Fabien Coelho;Gabriela Nicolescu;El Mostapha Aboulhamid
Affiliations:
Génie Informatique, Ecole Polytechnique de Montréal, Montréal, Canada H3C 3A7;Génie Informatique, Ecole Polytechnique de Montréal, Montréal, Canada H3C 3A7;ENSMP/CRI, Fontainebleau, France 77305;Génie Informatique, Ecole Polytechnique de Montréal, Montréal, Canada H3C 3A7;Université de Montréal, Montréal, Canada H3C 357
Venue:
Journal of VLSI Signal Processing Systems
Year:
2007

Citing 25
Cited 0

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
Scalar replacement in the presence of conditional control flow

Software—Practice & Experience
A strategy for array management in local memory

Mathematical Programming: Series A and B
Low energy memory and register allocation using network flow

DAC '97 Proceedings of the 34th annual Design Automation Conference
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
A general algorithm for tiling the register level

ICS '98 Proceedings of the 12th international conference on Supercomputing
On the complexity of loop fusion

Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Tiling imperfectly-nested loop nests

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Loop fusion for memory space optimization

Proceedings of the 14th international symposium on Systems synthesis
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Fast Greedy Weighted Fusion

International Journal of Parallel Programming
Optimizing inter-nest data locality

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Tiling and Memory Reuse for Sequences of Nested Loops

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
I/O-Conscious Tiling for Disk-Resident Data Sets

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Compiler-Based Approach for Improving Intra-Iteration Data Reuse

Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Parallel programming models for a multi-processor SoC platform applied to high-speed traffic management

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing the memory bandwidth with loop fusion

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A Register Allocation Algorithm in the Presence of Scalar Replacement for Fine-Grain Configurable Architectures

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Data space-oriented tiling for enhancing locality

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In today's embedded systems, memory hierarchy is rapidly becoming a major factor in terms of power, performance and area. This is especially true for embedded multimedia applications using temporary multi-dimensional arrays that are typically used to store intermediate results during multimedia processing. In this paper, we propose a new technique that optimizes the use of the cache and the registers. It consists in combining buffer and register allocation to reduce the size of the temporary arrays. Firstly we use the concept of live data to replace each array by a buffer of lower size. Then we replace references to these buffers by registers. The buffer allocation step keeps only useful data in memory and the register allocation step allows taking advantage of data reuse in internal loops. Codes considered in this paper are multimedia applications structured as a sequence of loop nests. The experiments are made on Unix environment and on the StepNP simulator (MPSoC platform of STMicroelctronics). They show that our technique yields significant reduction of the number of data cache and TLB misses.