Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Authors:
Nghi Nguyen;Angel Dominguez;Rajeev Barua
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2009

Citing 21
Cited 4

Power analysis of embedded software: a first step towards software power minimization

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low-power design
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
JouleTrack: a web based tool for software energy profiling

Proceedings of the 38th annual Design Automation Conference
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Storage allocation for embedded processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Heterogeneous memory management for embedded systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Reducing energy consumption by dynamic copying of instructions onto onchip memory

Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-optimized usage of partitioned memories

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems

Fine-grain dynamic instruction placement for L0 scratch-pad memory

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
FELI: HW/SW support for on-chip distributed shared memory in multicores

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Processor energy characterization for compiler-assisted software energy reduction

Journal of Electrical and Computer Engineering
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents the first memory allocation scheme for embedded systems having a scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. All existing memory allocation schemes for SPM require the SPM size to be known at compile time. Unfortunately, because of this constraint, the resulting executable is tied to that size of SPM and is not portable to other processor implementations having a different SPM size. Size-portable code is valuable when programs are downloaded during deployment either via a network or portable media. Code downloads are used for fixing bugs or for enhancing functionality. The presence of different SPM sizes in different devices is common because of the evolution in VLSI technology across years. The result is that SPM cannot be used in such situations with downloaded codes. To overcome this limitation, our work presents a compiler method whose resulting executable is portable across SPMs of any size. Our technique is to employ a customized installer software, which decides the SPM allocation just before the program's first run, since the SPM size can be discovered at that time. The installer then, based on the decided allocation, modifies the program executable accordingly. The resulting executable places frequently used objects in SPM, considering both code and data for placement. To keep the overhead low, much of the preprocessing for the allocation is done at compile time. Results show that our benchmarks average a 41% speedup versus an all-DRAM allocation, while the optimal static allocation scheme, which knows the SPM size at compile time and is thus an unachievable upper-bound and is only slightly faster (45% faster than all-DRAM). Results also show that the overhead from our customized installer averages about 1.5% in code size, 2% in runtime, and 3% in compile time for our benchmarks.