Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Address calculation for retargetable compilation and exploration of instruction-set architectures
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Algorithms for address assignment in DSP code generation
Proceedings of the 1996 IEEE/ACM international conference on Computer-aided design
A specification invariant technique for operation cost minimisation in flow-graphs
ISSS '94 Proceedings of the 7th international symposium on High-level synthesis
Analysis and evaluation of address arithmetic capabilities in custom DSP architectures
DAC '97 Proceedings of the 34th annual Design Automation Conference
DSP address optimization using a minimum cost circulation technique
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Advanced compiler design and implementation
Advanced compiler design and implementation
High-level address optimization and synthesis techniques for data-transfer-intensive applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of high-level address code transformations for programmable processors
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Cache conscious data layout organization for embedded multimedia applications
Proceedings of the conference on Design, automation and test in Europe
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Address code generation for digital signal processors
Proceedings of the 38th annual Design Automation Conference
Advanced Data Layout Optimization for Multimedia Applications
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Hardware Cache Optimization for Parallel Multimedia Applications
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Computers
Methods for evaluating and covering the design space during early design development
Integration, the VLSI Journal
Compiling for memory emergency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Maximizing data reuse for minimizing memory space requirements and execution cycles
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Iterative compilation for energy reduction
Journal of Embedded Computing - Cache exploitation in embedded systems
Hi-index | 0.00 |
The ever increasing gap between processor and memory speeds has motivated the design of embedded systems with deeper cache hierarchies. To avoid excessive miss rates, instead of using bigger cache memories and more complex cache controllers, program transformations have been proposed to reduce the amount of capacity and conflict misses. This is achieved however by complicating the memory index arithmetic code which results in performance degradation when executing the code on programmable processors with limited address capabilities. However, when these are complemented by high-level address code transformations, the overhead introduced can be largely eliminated at compile time. In this paper, the clear benefits of the combined approach is illustrated on two real-life applications of industrial relevance, using popular programmable processor architectures and showing important gains in energy (a factor 2 less) with a relatively small penalty in execution time (8-25%) instead of factors overhead without the address optimisation stage. The results of this paper leads to a systematic Pareto optimal trade-off (supported by tools) between memory power and CPU cycles which has up to now not been feasible for the targeted systems.The ever increasing gap between processor and memory speeds has motivated the design of embedded systems with deeper cache hierarchies. To avoid excessive miss rates, instead of using bigger cache memories and more complex cache controllers, program transformations have been proposed to reduce the amount of capacity and conflict misses. This is achieved however by complicating the memory index arithmetic code which results in performance degradation when executing the code on programmable processors with limited address capabilities. However, when these are complemented by high-level address code transformations, the overhead introduced can be largely eliminated at compile time. In this paper, the clear benefits of the combined approach is illustrated on two real-life applications of industrial relevance, using popular programmable processor architectures and showing important gains in energy (a factor 2 less) with a relatively small penalty in execution time (8-25%) instead of factors overhead without the address optimisation stage. The results of this paper leads to a systematic Pareto optimal trade-off (supported by tools) between memory power and CPU cycles which has up to now not been feasible for the targeted systems.