Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Analysis of scratch-pad and data-cache performance using statistical methods
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
On combining iteration space tiling with data space tiling for scratch-pad memory systems
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Compiler Optimizations to Reduce Security Overhead
Proceedings of the International Symposium on Code Generation and Optimization
Reuse analysis of indirectly indexed arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing off-chip memory access via stream-conscious tiling on multimedia applications
International Journal of Parallel Programming
Compiler-managed partitioned data caches for low power
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Interactive presentation: A decoupled architecture of processors with scratch-pad memory hierarchy
Proceedings of the conference on Design, automation and test in Europe
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
Proceedings of the 44th annual Design Automation Conference
Dynamic tag reduction for low-power caches in embedded systems with virtual memory
International Journal of Parallel Programming
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications
Journal of Signal Processing Systems
Direct address translation for virtual memory in energy-efficient embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit
Journal of Signal Processing Systems
Access-pattern-aware on-chip memory allocation for SIMD processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Scratchpad allocation for concurrent embedded software
ACM Transactions on Programming Languages and Systems (TOPLAS)
Journal of Signal Processing Systems
Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy
Proceedings of the Conference on Design, Automation and Test in Europe
VEGAS: soft vector processor with scratchpad memory
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Write activity reduction on non-volatile main memories for embedded chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Computers and Electrical Engineering
Hi-index | 0.03 |
Optimizations aimed at improving the efficiency of on-chip memories in embedded systems are extremely important. Using a suitable combination of program transformations and memory design space exploration aimed at enhancing data locality enables significant reductions in effective memory access latencies. While numerous compiler optimizations have been proposed to improve cache performance, there are relatively few techniques that focus on software-managed on-chip memories. It is well-known that software-managed memories are important in real-time embedded environments with hard deadlines as they allow one to accurately predict the amount of time a given code segment will take. In this paper, we propose and evaluate a compiler-controlled dynamic on-chip scratch-pad memory (SPM) management framework. Our framework includes an optimization suite that uses loop and data transformations, an on-chip memory partitioning step, and a code-rewriting phase that collectively transform an input code automatically to take advantage of the on-chip SPM. Compared with previous work, the proposed scheme is dynamic, and allows the contents of the SPM to change during the course of execution, depending on the changes in the data access pattern. Experimental results from our implementation using a source-to-source translator and a generic cost model indicate significant reductions in data transfer activity between the SPM and off-chip memory.