Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Nesting of reducible and irreducible loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Characterizations of Reducible Flow Graphs
Journal of the ACM (JACM)
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
An algorithm for reduction of operator strength
Communications of the ACM
Reducing energy consumption by dynamic copying of instructions onto onchip memory
Proceedings of the 15th international symposium on System Synthesis
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Tiny instruction caches for low power embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Polynomial-time algorithm for on-chip scratchpad memory partitioning
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
Proceedings of the international symposium on Code generation and optimization
Compiler-optimized usage of partitioned memories
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
Improving Program Efficiency by Packing Instructions into Registers
Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A novel instruction scratchpad memory optimization method based on concomitance metric
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
A dynamic code placement technique for scratchpad memory using postpass optimization
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic scratchpad memory management for code in portable systems with an MMU
ACM Transactions on Embedded Computing Systems (TECS)
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
An Energy-Efficient Processor Architecture for Embedded Systems
IEEE Computer Architecture Letters
Hierarchical Instruction Register Organization
IEEE Computer Architecture Letters
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
ACM Transactions on Embedded Computing Systems (TECS)
Compilers: Principles, Techniques, & Tools with Gradiance
Compilers: Principles, Techniques, & Tools with Gradiance
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories
HiPC'08 Proceedings of the 15th international conference on High performance computing
Overlay techniques for scratchpad memories in low power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (SPMs), whose unit of transfer can be an individual instruction. Our algorithm captures a large fraction of instruction reuse missed by coarse-grain placement algorithms whose unit of transfer is restricted to loops or functions within the capacity of SPMs. Evaluation of L0 SPMs with our fine-grain algorithm in 17 applications shows that the energy consumed by instruction storage hierarchy is reduced by 38% and 31% compared to that of L0 instruction caches and L0 SPMs with an ideal coarse-grain algorithm, respectively.