Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
The Omega Library interface guide
The Omega Library interface guide
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Formalized methodology for data reuse exploration in hierarchical memory mappings
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache conscious data layout organization for embedded multimedia applications
Proceedings of the conference on Design, automation and test in Europe
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Storage allocation for embedded processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Data Reuse Exploration Techniques for Loop-Dominated Applications
Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Data compression for improving SPM behavior
Proceedings of the 41st annual Design Automation Conference
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Control Flow Driven Splitting of Loop Nests at the Source Code Level
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
FORAY-GEN: Automatic Generation of Affine Functions for Memory Optimizations
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Data partitioning for maximal scratchpad usage
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Compiler driven data layout optimization for regular/irregular array access patterns
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications
Journal of Signal Processing Systems
Adaptive scratch pad memory management for dynamic behavior of multimedia applications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
Combined loop transformation and hierarchy allocation for data reuse optimization
Proceedings of the International Conference on Computer-Aided Design
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Optimizing local memory allocation and assignment through a decoupled approach
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
A decoupled local memory allocator
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improving high level synthesis optimization opportunity through polyhedral transformations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
In multimedia and other streaming applications, a significant portion of energy is spent on data transfers. Exploiting data reuse opportunities in the application, we can reduce this energy by making copies of frequently used data in a small local memory and replacing speed- and power-inefficient transfers from main off-chip memory by more efficient local data transfers. In this article we present an automated approach for analyzing these opportunities in a program that allows modification of the program to use custom scratch-pad memory configurations comprising a hierarchical set of buffers for local storage of frequently reused data. Using our approach we are able to both reduce energy consumption of the memory subsystem when using a scratch-pad memory by about a factor of two, on average, and improve memory system performance compared to a cache of the same size.