Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Digital integrated circuits: a design perspective
Digital integrated circuits: a design perspective
Identifying loops using DJ graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Nesting of reducible and irreducible loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
On loops, dominators, and dominance frontier
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Profiling in the ASP codesign environment
Journal of Systems Architecture: the EUROMICRO Journal
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Communication architecture based power management for battery efficient system design
Proceedings of the 39th annual Design Automation Conference
Reducing energy consumption by dynamic copying of instructions onto onchip memory
Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Optimal Code Placement of Embedded Software for Instruction Caches
EDTC '96 Proceedings of the 1996 European conference on Design and Test
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
A data alignment technique for improving cache performance
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Memory Organization for Improved Data Cache Performance in Embedded Processors
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Rapid Embedded Hardware/Software System Generation
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
A quantitative study and estimation models for extensible instructions in embedded processors
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Hardware/software managed scratchpad memory for embedded system
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Custom-instruction synthesis for extensible-processor platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A smart random code injection to mask power analysis based side channel attacks
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Predictable dynamic instruction scratchpad for simultaneous multithreaded processors
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
HitME: low power Hit MEmory buffer for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Results on leakage power management in scratchpad-based embedded systems
CSS '07 Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems
Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems
Proceedings of the Conference on Design, Automation and Test in Europe
Randomized Instruction Injection to Counter Power Analysis Attacks
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
A method to both reduce energy and improve performance in a processor-based embedded system is described in this paper. Comprising of a scratchpad memory instead of an instruction cache, the target system dynamically (at runtime) copies into the scratchpad code segments that are determined to be beneficial (in terms of energy efficiency and/or speed) to execute from the scratchpad. We develop a heuristic algorithm to select such code segments based on a metric, called concomitance. Concomitance is derived from the temporal relationships of instructions. A hardware controller is designed and implemented for managing the scratchpad memory. Strategically placed custom instructions in the program inform the hardware controller when to copy instructions from the main memory to the scratchpad. A novel heuristic algorithm is implemented for determining locations within the program where to insert these custom instructions. For a set of realistic benchmarks, experimental results indicate the method uses 41.9% lower energy (on average) and improves performance by 40.0% (on average) when compared to a traditional cache system which is identical in size.