Fine-grain dynamic instruction placement for L0 scratch-pad memory

Authors:
Jongsoo Park;James Balfour;William James Dally
Affiliations:
Stanford University, Stanford, CA, USA;NVIDIA Research, Santa Clara, CA, USA;Stanford University, Stanford, CA, USA
Venue:
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2010

Citing 39
Cited 0

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Nesting of reducible and irreducible loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Characterizations of Reducible Flow Graphs

Journal of the ACM (JACM)
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
An algorithm for reduction of operator strength

Communications of the ACM
Reducing energy consumption by dynamic copying of instructions onto onchip memory

Proceedings of the 15th international symposium on System Synthesis
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Tiny instruction caches for low power embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache

Proceedings of the international symposium on Code generation and optimization
Compiler-optimized usage of partitioned memories

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

IEEE Transactions on Computers
Improving Program Efficiency by Packing Instructions into Registers

Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A novel instruction scratchpad memory optimization method based on concomitance metric

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
A dynamic code placement technique for scratchpad memory using postpass optimization

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Addressing instruction fetch bottlenecks by using an instruction register file

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic scratchpad memory management for code in portable systems with an MMU

ACM Transactions on Embedded Computing Systems (TECS)
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
An Energy-Efficient Processor Architecture for Embedded Systems

IEEE Computer Architecture Letters
Hierarchical Instruction Register Organization

IEEE Computer Architecture Letters
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

ACM Transactions on Embedded Computing Systems (TECS)
Compilers: Principles, Techniques, & Tools with Gradiance

Compilers: Principles, Techniques, & Tools with Gradiance
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories

HiPC'08 Proceedings of the 15th international conference on High performance computing
Overlay techniques for scratchpad memories in low power embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (SPMs), whose unit of transfer can be an individual instruction. Our algorithm captures a large fraction of instruction reuse missed by coarse-grain placement algorithms whose unit of transfer is restricted to loops or functions within the capacity of SPMs. Evaluation of L0 SPMs with our fine-grain algorithm in 17 applications shows that the energy consumed by instruction storage hierarchy is reduced by 38% and 31% compared to that of L0 instruction caches and L0 SPMs with an ideal coarse-grain algorithm, respectively.