A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor
Digital Technical Journal
Reducing energy consumption by dynamic copying of instructions onto onchip memory
Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory management for portable systems with a memory management unit
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Overlay techniques for scratchpad memories in low power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A software-only solution to use scratch pads for stack data
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A performance model and code overlay generator for scratchpad enhanced embedded processors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Heap data management for limited local memory (LLM) multi-core processors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Branch penalty reduction on IBM cell SPUs via software branch hinting
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Scheduling of synchronous data flow models on scratchpad memory based embedded processors
Proceedings of the International Conference on Computer-Aided Design
Automatic code overlay generation and partially redundant code fetch elimination
ACM Transactions on Architecture and Code Optimization (TACO)
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors
ACM Transactions on Embedded Computing Systems (TECS)
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
CMSM: an efficient and effective code management for software managed multicores
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
Many programmable embedded systems feature low power processorscoupled with fast compiler controlled on-chip scratchpad memories (SPMs) toreduce their energy consumption. SPMs are more efficient than caches in termsof energy consumption, performance, area and timing predictability. However,unlike caches SPMs need explicit management by software, the quality ofwhich can impact the performance of SPM based systems. In this paper, wepresent a fully-automated, dynamic code overlaying technique for SPMs basedon pure static analysis. Static analysis is less restrictive than profiling and canbe easily extended to general compiler framework where the time consumingand expensive task of profiling may not be feasible. The SPM code mappingproblem is harder than bin packing problem, which is NP-complete. Therefore weformulate the SPMcode mapping as a binary integer linear programming problemand also propose a heuristic, determining simultaneously the region (bin) sizesas well as the function-to-region mapping. To the best of our knowledge, thisis the first heuristic which simultaneously solves the interdependent problemsof region size determination and the function-to-region mapping. We evaluateour approach for a set of MiBench applications on a horizontally split I-cache and SPM architecture (HSA). Compared to a cache-only architecture (COA),the HSA gives an average energy reduction of 35%, with minimal performancedegradation. For the HSA, we also compare the energy results from our proposedSDRM heuristic against a previous static analysis based mapping heuristic andobserve an average 27% energy reduction.