Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
The Omega Library interface guide
The Omega Library interface guide
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Formalized methodology for data reuse exploration in hierarchical memory mappings
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Cache conscious data layout organization for embedded multimedia applications
Proceedings of the conference on Design, automation and test in Europe
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Data Reuse Exploration Techniques for Loop-Dominated Applications
Proceedings of the conference on Design, automation and test in Europe
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the 1st conference on Computing frontiers
FORAY-GEN: Automatic Generation of Affine Functions for Memory Optimizations
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Analysis of scratch-pad and data-cache performance using statistical methods
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Compiler Optimizations to Reduce Security Overhead
Proceedings of the International Symposium on Code Generation and Optimization
Reuse analysis of indirectly indexed arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies
Proceedings of the 43rd annual Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Demand paging for OneNAND™ Flash eXecute-in-place
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Software controlled memory layout reorganization for irregular array access patterns
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Using FORAY models to enable MPSoC memory optimizations
International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
An automatic scratch pad memory management tool and MPEG-4 encoder case study
Proceedings of the 45th annual Design Automation Conference
Static analysis of processor stall cycle aggregation
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
A framework for low energy data management in reconfigurable multi-context architectures
Journal of Systems Architecture: the EUROMICRO Journal
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Journal of Signal Processing Systems
CuMAPz: a tool to analyze memory access patterns in CUDA
Proceedings of the 48th Design Automation Conference
Minimizing data size for efficient data reuse in grid-enabled medical applications
ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Memory performance estimation of CUDA programs
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Hi-index | 0.00 |
In multimedia and other streaming applications a significant portion of energy is spent on data transfers. Exploiting data reuse opportunities in the application, we can reduce this energy by making copies of frequently used data in a small local memory and replacing speed and power inefficient transfers from main off-chip memory by more efficient local data transfers. In this paper we present an automated approach for analyzing these opportunities in a program that allows modification of the program to use custom scratch pad memory configurations comprising a hierarchical set of buffers for local storage of frequently reused data. Using our approach we are able to reduce energy consumption of the memory subsystem when using ascratch pad memory by a factor of two on average compared to a cache of the same size.