Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
An Analytical Model for Designing Memory Hierarchies
IEEE Transactions on Computers
Synthesis of application-specific memory designs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SUSAN—A New Approach to Low Level Image Processing
International Journal of Computer Vision
Formalized methodology for data reuse exploration in hierarchical memory mappings
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Exact memory size estimation for array computations without loop unrolling
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Memory binding for performance optimization of control-flow intensive behaviors
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
A recursive algorithm for low-power memory partitioning
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Systematic data reuse exploration methodology for irregular access patterns
ISSS '00 Proceedings of the 13th international symposium on System synthesis
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Issues in Multi-Level Cache Designs
ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors
Memory Organization for Improved Data Cache Performance in Embedded Processors
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Data Reuse Exploration Techniques for Loop-Dominated Applications
Proceedings of the conference on Design, automation and test in Europe
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Elimination of redundant memory traffic in high-level synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Efficient general-purpose image compression with binary tree predictive coding
IEEE Transactions on Image Processing
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.01 |
Efficient exploitation of temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in embedded data dominated applications. The effective use of an optimized custom memory hierarchy or a customized software controlled mapping on a predefined hierarchy is crucial for this. Only recently have effective systematic techniques to deal with this specific design step begun to appear. They are still limited in their exploration scope. In this paper we construct the design space by introducing three parameters which determine how and when copies are made between different levels in a hierarchy, and determine their impact on the total memory size, storage-related power consumption, and code complexity. Strategies are then established for an efficient exploration, such that cost-effective solutions for the memory size/power trade-off can be achieved. The effectiveness of the techniques is demonstrated for several real-life image processing algorithms.