Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
The priority-based coloring approach to register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation across procedure and module boundaries
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Improvements to graph coloring register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Energy optimization of multi-level processor cache architectures
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimum cost interprocedural register allocation
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Spill code minimization via interference region spilling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Register promotion in C programs
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Live Range Splitting in a Graph Coloring Register Allocator
CC '98 Proceedings of the 7th International Conference on Compiler Construction
Global Variable Promotion: Using Registers to Reduce Cache Power Dissipation
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Region-based compilation
A post-compilation register reassignment technique for improving hamming distance code compression
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
DisIRer: Converting a retargetable compiler into a multiplatform binary translator
ACM Transactions on Architecture and Code Optimization (TACO)
MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Dynamic register promotion of stack variables
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A decoupled non-SSA global register allocation using bipartite liveness graphs
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Energy efficiency is rapidly becoming a first class optimization parameter for modern systems. Caches are critical to the overall performance and thus, modern processors (both high and low-end) tend to deploy a cache with large size and high degree of associativity. Due a large size cache power takes up a significant percentage of total system power. One important way to reduce cache power consumption is to reduce the dynamic activities in the cache by reducing the dynamic load-store counts. In this work, we focus on programs that are only available as binaries which need to be improved for energy efficiency. For adapting these programs for energy-constrained devices, we propose a feed-back directed post-pass solution that tries to do register re-allocation to reduce dynamic load/store counts and to improve energy-efficiency. Our approach is based on zero knowledge of original code generator or compiler and performs a post-pass register allocation to get a more power-efficient binary. We attempt to find out the dead as well as unused registers in the binary and then re-allocate them on hot paths to reduce dynamic load/store counts. It is shown that the static code size increase due to our framework is very minimal. Our experiments on SPEC2000 and MediaBench show that our technique is effective. We have seen dynamic spill loads/stores reduction in the data-cache ranging from 0% to 26.4%. Overall, our approach improves the energy-delay product of the program.