Systematic speed-power memory data-layout exploration for cache controlled embedded multimedia applications

Authors:
M. Miranda;C. Ghez;C. Kulkarni;F. Catthoor;D. Verkest
Affiliations:
IMEC Lab., Leuven, Belgium;IMEC Lab., Leuven, Belgium;IMEC Lab., Leuven, Belgium;IMEC Lab., Leuven, Belgium and Katholieke Universiteit Leuven, Belgium;IMEC Lab., Leuven, Belgium
Venue:
Proceedings of the 14th international symposium on Systems synthesis
Year:
2001

Citing 16
Cited 5

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Address calculation for retargetable compilation and exploration of instruction-set architectures

DAC '96 Proceedings of the 33rd annual Design Automation Conference
Algorithms for address assignment in DSP code generation

Proceedings of the 1996 IEEE/ACM international conference on Computer-aided design
A specification invariant technique for operation cost minimisation in flow-graphs

ISSS '94 Proceedings of the 7th international symposium on High-level synthesis
Analysis and evaluation of address arithmetic capabilities in custom DSP architectures

DAC '97 Proceedings of the 34th annual Design Automation Conference
DSP address optimization using a minimum cost circulation technique

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Advanced compiler design and implementation

Advanced compiler design and implementation
High-level address optimization and synthesis techniques for data-transfer-intensive applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of high-level address code transformations for programmable processors

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Cache conscious data layout organization for embedded multimedia applications

Proceedings of the conference on Design, automation and test in Europe
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Address code generation for digital signal processors

Proceedings of the 38th annual Design Automation Conference
Advanced Data Layout Optimization for Multimedia Applications

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Hardware Cache Optimization for Parallel Multimedia Applications

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Program transformation strategies for memory size and power reduction of pseudoregular multimedia subsystems

IEEE Transactions on Circuits and Systems for Video Technology

Cache Conscious Data Layout Organization for Conflict Miss Reduction in Embedded Multimedia Applications

IEEE Transactions on Computers
Methods for evaluating and covering the design space during early design development

Integration, the VLSI Journal
Compiling for memory emergency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Maximizing data reuse for minimizing memory space requirements and execution cycles

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Iterative compilation for energy reduction

Journal of Embedded Computing - Cache exploitation in embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ever increasing gap between processor and memory speeds has motivated the design of embedded systems with deeper cache hierarchies. To avoid excessive miss rates, instead of using bigger cache memories and more complex cache controllers, program transformations have been proposed to reduce the amount of capacity and conflict misses. This is achieved however by complicating the memory index arithmetic code which results in performance degradation when executing the code on programmable processors with limited address capabilities. However, when these are complemented by high-level address code transformations, the overhead introduced can be largely eliminated at compile time. In this paper, the clear benefits of the combined approach is illustrated on two real-life applications of industrial relevance, using popular programmable processor architectures and showing important gains in energy (a factor 2 less) with a relatively small penalty in execution time (8-25%) instead of factors overhead without the address optimisation stage. The results of this paper leads to a systematic Pareto optimal trade-off (supported by tools) between memory power and CPU cycles which has up to now not been feasible for the targeted systems.The ever increasing gap between processor and memory speeds has motivated the design of embedded systems with deeper cache hierarchies. To avoid excessive miss rates, instead of using bigger cache memories and more complex cache controllers, program transformations have been proposed to reduce the amount of capacity and conflict misses. This is achieved however by complicating the memory index arithmetic code which results in performance degradation when executing the code on programmable processors with limited address capabilities. However, when these are complemented by high-level address code transformations, the overhead introduced can be largely eliminated at compile time. In this paper, the clear benefits of the combined approach is illustrated on two real-life applications of industrial relevance, using popular programmable processor architectures and showing important gains in energy (a factor 2 less) with a relatively small penalty in execution time (8-25%) instead of factors overhead without the address optimisation stage. The results of this paper leads to a systematic Pareto optimal trade-off (supported by tools) between memory power and CPU cycles which has up to now not been feasible for the targeted systems.