A compiler-based approach for dynamically managing scratch-pad memories in embedded systems

Authors:
M. Kandemir;J. Ramanujam;M. J. Irwin;N. Vijaykrishnan;I. Kadayif;A. Parikh
Affiliations:
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA;-;-;-;-;-
Venue:
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year:
2006

Citing 0
Cited 24

Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Analysis of scratch-pad and data-cache performance using statistical methods

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
On combining iteration space tiling with data space tiling for scratch-pad memory systems

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Compiler Optimizations to Reduce Security Overhead

Proceedings of the International Symposium on Code Generation and Optimization
Reuse analysis of indirectly indexed arrays

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing off-chip memory access via stream-conscious tiling on multimedia applications

International Journal of Parallel Programming
Compiler-managed partitioned data caches for low power

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Interactive presentation: A decoupled architecture of processors with scratch-pad memory hierarchy

Proceedings of the conference on Design, automation and test in Europe
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors

Proceedings of the 44th annual Design Automation Conference
Dynamic tag reduction for low-power caches in embedded systems with virtual memory

International Journal of Parallel Programming
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications

Journal of Signal Processing Systems
Direct address translation for virtual memory in energy-efficient embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit

Journal of Signal Processing Systems
Access-pattern-aware on-chip memory allocation for SIMD processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Combining data reuse with data-level parallelization for FPGA-targeted hardware compilation: a geometric programming framework

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Scratchpad allocation for concurrent embedded software

ACM Transactions on Programming Languages and Systems (TOPLAS)
Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

Journal of Signal Processing Systems
Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy

Proceedings of the Conference on Design, Automation and Test in Europe
VEGAS: soft vector processor with scratchpad memory

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Write activity reduction on non-volatile main memories for embedded chip multiprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property

Computers and Electrical Engineering

Quantified Score

Hi-index	0.03

Visualization

Abstract

Optimizations aimed at improving the efficiency of on-chip memories in embedded systems are extremely important. Using a suitable combination of program transformations and memory design space exploration aimed at enhancing data locality enables significant reductions in effective memory access latencies. While numerous compiler optimizations have been proposed to improve cache performance, there are relatively few techniques that focus on software-managed on-chip memories. It is well-known that software-managed memories are important in real-time embedded environments with hard deadlines as they allow one to accurately predict the amount of time a given code segment will take. In this paper, we propose and evaluate a compiler-controlled dynamic on-chip scratch-pad memory (SPM) management framework. Our framework includes an optimization suite that uses loop and data transformations, an on-chip memory partitioning step, and a code-rewriting phase that collectively transform an input code automatically to take advantage of the on-chip SPM. Compared with previous work, the proposed scheme is dynamic, and allows the contents of the SPM to change during the course of execution, depending on the changes in the data access pattern. Experimental results from our implementation using a source-to-source translator and a generic cost model indicate significant reductions in data transfer activity between the SPM and off-chip memory.