Binary translation to improve energy efficiency through post-pass register re-allocation

Authors:
Kun Zhang;Tao Zhang;Santosh Pande
Affiliations:
Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA
Venue:
Proceedings of the 4th ACM international conference on Embedded software
Year:
2004

Citing 18
Cited 5

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation across procedure and module boundaries

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Improvements to graph coloring register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Energy optimization of multi-level processor cache architectures

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Iterated register coalescing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimum cost interprocedural register allocation

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Spill code minimization via interference region spilling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Register promotion in C programs

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Live Range Splitting in a Graph Coloring Register Allocator

CC '98 Proceedings of the 7th International Conference on Compiler Construction
Global Variable Promotion: Using Registers to Reduce Cache Power Dissipation

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Region-based compilation

Region-based compilation

A post-compilation register reassignment technique for improving hamming distance code compression

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
DisIRer: Converting a retargetable compiler into a multiplatform binary translator

ACM Transactions on Architecture and Code Optimization (TACO)
A VLIW-based post compilation framework for multimedia embedded DSPs with hardware specific optimizations

MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Dynamic register promotion of stack variables

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A decoupled non-SSA global register allocation using bipartite liveness graphs

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Energy efficiency is rapidly becoming a first class optimization parameter for modern systems. Caches are critical to the overall performance and thus, modern processors (both high and low-end) tend to deploy a cache with large size and high degree of associativity. Due a large size cache power takes up a significant percentage of total system power. One important way to reduce cache power consumption is to reduce the dynamic activities in the cache by reducing the dynamic load-store counts. In this work, we focus on programs that are only available as binaries which need to be improved for energy efficiency. For adapting these programs for energy-constrained devices, we propose a feed-back directed post-pass solution that tries to do register re-allocation to reduce dynamic load/store counts and to improve energy-efficiency. Our approach is based on zero knowledge of original code generator or compiler and performs a post-pass register allocation to get a more power-efficient binary. We attempt to find out the dead as well as unused registers in the binary and then re-allocate them on hot paths to reduce dynamic load/store counts. It is shown that the static code size increase due to our framework is very minimal. Our experiments on SPEC2000 and MediaBench show that our technique is effective. We have seen dynamic spill loads/stores reduction in the data-cache ranging from 0% to 26.4%. Overall, our approach improves the energy-delay product of the program.