A hardware/software framework for instruction and data scratchpad memory allocation

Authors:
Zhong-Ho Chen;Alvin W. Y. Su
Affiliations:
National Cheng-Kung University, Taiwan;National Cheng-Kung University, Taiwan
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2010

Citing 35
Cited 2

Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A retargetable compiler for ANSI C

ACM SIGPLAN Notices
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Exploiting scratch-pad memory using Presburger formulas

Proceedings of the 14th international symposium on Systems synthesis
Storage allocation for embedded processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Heterogeneous memory management for embedded systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Reducing energy consumption by dynamic copying of instructions onto onchip memory

Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
An integrated hardware/software approach for run-time scratchpad management

Proceedings of the 41st annual Design Automation Conference
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Compiler-optimized usage of partitioned memories

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Hardware/software managed scratchpad memory for embedded system

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
A novel instruction scratchpad memory optimization method based on concomitance metric

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
An integrated scratch-pad allocator for affine and non-affine code

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory management for portable systems with a memory management unit

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
A novel technique to use scratch-pad memory for stack management

Proceedings of the conference on Design, automation and test in Europe
Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Program restructuring for virtual memory

IBM Systems Journal
Application-driven synthesis of memory-intensive systems-on-chip

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An efficient profile-based algorithm for scratchpad memory partitioning

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Processor energy characterization for compiler-assisted software energy reduction

Journal of Electrical and Computer Engineering
Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous researches show that a scratchpad memory device consumed less energy than a cache device with the same capacity. In this article, we locate the scratchpad memory (SPM) in the top level of the memory hierarchy to reduce the energy consumption. To take the advantage of a SPM, we address two issues of utilizing a SPM. First, the program's locality should be improved. The second issue is SPM management. To tackle these two issues, we present a hardware/software framework for dynamically allocating both instructions and data in SPM. The software flow could be divided into three phases: locality improving, locality extraction, and runtime SPM management. Without modifying the original compiler and the source code, we improve the locality of a program. An optimization algorithm is proposed to extract the SPM allocations. At runtime, an SPM management program is employed. In hardware, an address translation logic (ATL) is proposed to reduce the overhead of SPM management. The results show that the proposed framework can reduce energy delay product (EDP) by 63%, on average, when compared with the traditional cache architecture. The reduction in EDP is contributed by properly allocating both instructions and data in SPM. By allocating only instructions in SPM, the EDPs are reduced by 45%, on average. By allocating only data in SPM, the EDPs are reduced by 14%, on average.