Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Authors:
Nghi Nguyen;Angel Dominguez;Rajeev Barua
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2005

Citing 14
Cited 20

Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Storage allocation for embedded processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Heterogeneous memory management for embedded systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A post-compiler approach to scratchpad mapping of code

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-optimized usage of partitioned memories

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture

Scratchpad memory management for portable systems with a memory management unit

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Dynamic data scratchpad memory management for a memory subsystem with an MMU

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Fragment cache management for dynamic binary translators in embedded systems with scratchpad

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Dynamic scratchpad memory management for code in portable systems with an MMU

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory management in a multitasking environment

EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software
A software solution for dynamic stack management on scratch pad memory

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Runtime resource allocation in multi-core packet processing systems

HPSR'09 Proceedings of the 15th international conference on High Performance Switching and Routing
A software-only solution to use scratch pads for stack data

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Scratchpad allocation for concurrent embedded software

ACM Transactions on Programming Languages and Systems (TOPLAS)
A hardware/software framework for instruction and data scratchpad memory allocation

ACM Transactions on Architecture and Code Optimization (TACO)
Implementation, compilation, optimization of object-oriented languages, programs and systems: report on the workshop ICOOOLPS 2007 at ECOOP 2007

ECOOP'07 Proceedings of the 2007 conference on Object-oriented technology
Improved procedure placement for set associative caches

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Heap data management for limited local memory (LLM) multi-core processors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Dynamic and adaptive SPM management for a multi-task environment

Journal of Systems Architecture: the EUROMICRO Journal
Architecture extensions for efficient management of scratch-pad memory

PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
Link-time optimization for power efficiency in a tagless instruction cache

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Automatic and efficient heap data management for limited local memory multicore architectures

Proceedings of the Conference on Design, Automation and Test in Europe
SSDM: smart stack data management for software managed multicores (SMMs)

Proceedings of the 50th Annual Design Automation Conference
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

ACM Transactions on Embedded Computing Systems (TECS)
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the first memory allocation scheme for embedded systems having scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. Its uses are motivated by its better real-time guarantees as compared to cache and by its significantly lower overheads in energy consumption, area and access time.Existing data allocation schemes for SPM all require that the SPM size be known at compile-time. Unfortunately, the resulting executable is tied to that size of SPM and is not portable to processor implementations having a different SPM size. Such portability would be valuable in situations where programs for an embedded system are not burned into the system at the time of manufacture, but rather are downloaded onto it during deployment, either using a network or portable media such as memory sticks. Such post-deployment code updates are common in distributed networks and in personal hand-held devices. The presence of different SPM sizes in different devices is common because of the evolution in VLSI technology across years. The result is that SPM cannot be used in such situations with downloaded code.To overcome this limitation, this work presents a compiler method whose resulting executable is portable across SPMs of any size. The executable at run-time places frequently used objects in SPM; it considers code, global variables and stack variables for placement in SPM. The allocation is decided by modified loader software before the program is first run and once the SPM size can be discovered. The loader then modifies the program binary based on the decided allocation. To keep the overhead low, much of the pre-processing for the allocation is done at compile-time. Results show that our benchmarks average a 36% speed increase versus an all-DRAM allocation, while the optimal static allocation scheme, which knows the SPM size at compile-time and is thus an un-achievable upper-bound, is only slightly faster (41% faster than all-DRAM). Results also show that the overhead from our embedded loader averages about 1% in both code-size and run-time of our benchmarks.