Heterogeneous memory management for embedded systems

Authors:
Oren Avissar;Rajeev Barua;Dave Stewart
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD;Embedded Research Solutions, LLC, Columbia, MD
Venue:
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2001

Citing 6
Cited 24

Power analysis and minimization techniques for embedded DSP software

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimal spilling for CISC machines with few registers

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers

Compiling with code-size constraints

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Compiling with code-size constraints

ACM Transactions on Embedded Computing Systems (TECS)
An integrated hardware/software approach for run-time scratchpad management

Proceedings of the 41st annual Design Automation Conference
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing the memory bandwidth with loop fusion

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A time-predictable execution mode for superscalar pipelines with instruction prescheduling

Proceedings of the 2nd conference on Computing frontiers
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Variable-Based Multi-module Data Caches for Clustered VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
Recursive function data allocation to scratch-pad memory

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Scratch-pad memory allocation without compiler support for java applications

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Optimization of memory system in real-time embedded systems

ICCOMP'07 Proceedings of the 11th WSEAS International Conference on Computers
Programming Reconfigurable Decoupled Application Control Accelerator for Mobile Systems

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Coordinated concurrent memory accesses on a reconfigurable multimedia accelerator

Microprocessors & Microsystems
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

ACM Transactions on Embedded Computing Systems (TECS)
A hardware/software framework for instruction and data scratchpad memory allocation

ACM Transactions on Architecture and Code Optimization (TACO)
Software metadata: Systematic characterization of the memory behaviour of dynamic applications

Journal of Systems and Software
Dynamic and adaptive SPM management for a multi-task environment

Journal of Systems Architecture: the EUROMICRO Journal
DynaPoMP: dynamic policy-driven memory protection for SPM-based embedded systems

WESS '11 Proceedings of the Workshop on Embedded Systems Security
On-chip memory architecture exploration framework for DSP processor-based embedded system on chip

ACM Transactions on Embedded Computing Systems (TECS)
Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in micro-controllers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead the memory units are mapped to different portions of the address space. Caches are avoided because of their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among the different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal among all static partitions for global and stack data, and a good heuristic for heap data. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Preliminary results show the benefits of optimal allocation: with just 20% of the data in SRAM, the formulation is able to decrease the runtime by 39% on average for our benchmarks vs. allocating all data to slow memory, without any programmer involvement. For some programs, less than 5% of data in SRAM achieves a similar speedup.