Dynamic allocation for scratch-pad memory using compile-time decisions

Authors:
Sumesh Udayakumaran;Angel Dominguez;Rajeev Barua
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2006

Citing 32
Cited 54

Optimal and near-optimal global register allocations using 0–1 integer programming

Software—Practice & Experience
Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler controlled value prediction using branch predictor based confidence

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Tiling imperfectly-nested loop nests

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
JouleTrack: a web based tool for software energy profiling

Proceedings of the 38th annual Design Automation Conference
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Storage allocation for embedded processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Heterogeneous memory management for embedded systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
The performance and energy consumption of three embedded real-time operating systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Modern Compiler Implementation in C

Modern Compiler Implementation in C
Structured Computer Organization

Structured Computer Organization
Reducing energy consumption by dynamic copying of instructions onto onchip memory

Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Embedded DRAM Development: Technology, Physical Design, and Application Issues

IEEE Design & Test
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Compiler Optimizations for Real Time Execution of Loops on Limited Memory Embedded Systems

RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
The Performance and Energy Consumption of Embedded Real-Time Operating Systems

IEEE Transactions on Computers
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
An integrated hardware/software approach for run-time scratchpad management

Proceedings of the 41st annual Design Automation Conference
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-optimized usage of partitioned memories

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
An integrated scratch-pad allocator for affine and non-affine code

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems

Bee+Cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing software cache performance of packet processing applications

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A novel technique to use scratch-pad memory for stack management

Proceedings of the conference on Design, automation and test in Europe
Incremental hierarchical memory size estimation for steering of loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Recursive function data allocation to scratch-pad memory

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Scratch-pad memory allocation without compiler support for java applications

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Prefetching irregular references for software cache on cell

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
An automatic scratch pad memory management tool and MPEG-4 encoder case study

Proceedings of the 45th annual Design Automation Conference
Programming Reconfigurable Decoupled Application Control Accelerator for Mobile Systems

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Hybrid access-specific software cache techniques for the cell BE architecture

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Coordinated concurrent memory accesses on a reconfigurable multimedia accelerator

Microprocessors & Microsystems
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

ACM Transactions on Embedded Computing Systems (TECS)
Compiler-directed scratchpad memory management via graph coloring

ACM Transactions on Architecture and Code Optimization (TACO)
On the energy-efficiency of software transactional memory

Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Design and Tool Flow of Multimedia MPSoC Platforms

Journal of Signal Processing Systems
Performance balancing: software-based on-chip memory management for effective CMP executions

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Implementing time-predictable load and store operations

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Instruction cache locking inside a binary rewriter

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Access-pattern-aware on-chip memory allocation for SIMD processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Program overlays revisited

PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
A hardware/software framework for instruction and data scratchpad memory allocation

ACM Transactions on Architecture and Code Optimization (TACO)
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Customized placement for high performance embedded processor caches

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Implementation, compilation, optimization of object-oriented languages, programs and systems: report on the workshop ICOOOLPS 2007 at ECOOP 2007

ECOOP'07 Proceedings of the 2007 conference on Object-oriented technology
SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories

HiPC'08 Proceedings of the 15th international conference on High performance computing
Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy

Proceedings of the Conference on Design, Automation and Test in Europe
Improving scratchpad allocation with demand-driven data tiling

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Fine-grain dynamic instruction placement for L0 scratch-pad memory

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A performance model and code overlay generator for scratchpad enhanced embedded processors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Heap data management for limited local memory (LLM) multi-core processors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs

ACM Transactions on Embedded Computing Systems (TECS)
A dynamic instruction scratchpad memory for embedded processors managed by hardware

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Full Length Article: An on-chip instruction cache design with one-bit tag for low-power embedded systems

Microprocessors & Microsystems
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Cache-tuning-aware scratchpad allocation from binaries

Proceedings of the 24th symposium on Integrated circuits and systems design
FELI: HW/SW support for on-chip distributed shared memory in multicores

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Reducing memory space consumption through dataflow analysis

Computer Languages, Systems and Structures
DynaPoMP: dynamic policy-driven memory protection for SPM-based embedded systems

WESS '11 Proceedings of the Workshop on Embedded Systems Security
Dynamic data type optimization and memory assignment methodologies

PATMOS'09 Proceedings of the 19th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
On-chip memory architecture exploration framework for DSP processor-based embedded system on chip

ACM Transactions on Embedded Computing Systems (TECS)
Optimizing local memory allocation and assignment through a decoupled approach

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Processor energy characterization for compiler-assisted software energy reduction

Journal of Electrical and Computer Engineering
Characterizing and improving the use of demand-fetched caches in GPUs

Proceedings of the 26th ACM international conference on Supercomputing
Enabling dynamic binary translation in embedded systems with scratchpad memory

ACM Transactions on Embedded Computing Systems (TECS)
A decoupled local memory allocator

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
SSDM: smart stack data management for software managed multicores (SMMs)

Proceedings of the 50th Annual Design Automation Conference
Run-time reconfiguration of expandable cache for embedded systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

ACM Transactions on Embedded Computing Systems (TECS)
Optimizing Data Placement of Loops for Energy Minimization with Multiple Types of Memories

Journal of Signal Processing Systems
CMSM: an efficient and effective code management for software managed multicores

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
An efficient compiler framework for cache bypassing on GPUs

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple allocation scheme. Primarily scratch pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation, results show that our scheme reduces runtime by up to 39.8% and energy by up to 31.3%, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.