Optimal register reassignment for register stack overflow minimization

Authors:
Yoonseo Choi;Hwansoo Han
Affiliations:
Korea Advanced Institute of Science and Technology, Daejeon, Korea;Korea Advanced Institute of Science and Technology, Daejeon, Korea
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2006

Citing 14
Cited 1

Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Minimizing register usage penalty at procedure calls

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A simple interprocedural register allocation algorithm and its effectiveness for LISP

ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Improvements to graph coloring register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimum cost interprocedural register allocation

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Call-cost directed register allocation

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Register allocation by priority-based coloring

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Optimization for the Intel® Itanium® architecture register stack

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Inter-procedural stacked register allocation for itanium® like architecture

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Quantitative Evaluation of the Register Stack Engine and Optimizations for Future Itanium Processors

INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Compiler Optimizations for Transaction Processing Workloads on Itanium® Linux Systems

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fine-grain stacked register allocation for the itanium architecture

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Rethinking Java call stack design for tiny embedded devices

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Architectures with a register stack can implement efficient calling conventions. Using the overlapping of callers' and callees' registers, callers are able to pass parameters to callees without a memory stack. The most recent instance of a register stack can be found in the Intel Itanium architecture. A hardware component called the register stack engine (RSE) provides an illusion of an infinite-length register stack using a memory-backed process to handle overflow and underflow for a physically limited number of registers. Despite such hardware support, some applications suffer from the overhead required to handle register stack overflow and underflow. The memory latency associated with the overflow and underflow of a register stack can be reduced by generating multiple register allocation instructions within a procedure [Settle et al. 2003]. Live analysis is utilized to find a set of registers that are not required to keep their values across procedure boundaries. However, among those dead registers, only the registers that are consecutively located in a certain part of the register stack frame can be removed. We propose a compiler-supported register reassignment technique that reduces RSE overflow/underflow further. By reassigning registers based on live analysis, our technique forces as many dead registers to be removed as possible. We define the problem of optimal register reassignment, which minimizes interprocedural register stack heights considering multiple call sites within a procedure. We present how this problem is related to a path-finding problem in a graph called a sequence graph. We also propose an efficient heuristic algorithm for the problem. Finally, we present the measurement of effects of the proposed techniques on SPEC CINT2000 benchmark suite and the analysis of the results. The result shows that our approach reduces the RSE cycles by 6.4% and total cpu cycles by 1.7% on average.