Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Minimizing register usage penalty at procedure calls
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A simple interprocedural register allocation algorithm and its effectiveness for LISP
ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
Improvements to graph coloring register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimum cost interprocedural register allocation
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Call-cost directed register allocation
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Register allocation by priority-based coloring
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Optimization for the Intel® Itanium® architecture register stack
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Inter-procedural stacked register allocation for itanium® like architecture
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Quantitative Evaluation of the Register Stack Engine and Optimizations for Future Itanium Processors
INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Compiler Optimizations for Transaction Processing Workloads on Itanium® Linux Systems
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fine-grain stacked register allocation for the itanium architecture
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Rethinking Java call stack design for tiny embedded devices
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Hi-index | 0.00 |
Architectures with a register stack can implement efficient calling conventions. Using the overlapping of callers' and callees' registers, callers are able to pass parameters to callees without a memory stack. The most recent instance of a register stack can be found in the Intel Itanium architecture. A hardware component called the register stack engine (RSE) provides an illusion of an infinite-length register stack using a memory-backed process to handle overflow and underflow for a physically limited number of registers. Despite such hardware support, some applications suffer from the overhead required to handle register stack overflow and underflow. The memory latency associated with the overflow and underflow of a register stack can be reduced by generating multiple register allocation instructions within a procedure [Settle et al. 2003]. Live analysis is utilized to find a set of registers that are not required to keep their values across procedure boundaries. However, among those dead registers, only the registers that are consecutively located in a certain part of the register stack frame can be removed. We propose a compiler-supported register reassignment technique that reduces RSE overflow/underflow further. By reassigning registers based on live analysis, our technique forces as many dead registers to be removed as possible. We define the problem of optimal register reassignment, which minimizes interprocedural register stack heights considering multiple call sites within a procedure. We present how this problem is related to a path-finding problem in a graph called a sequence graph. We also propose an efficient heuristic algorithm for the problem. Finally, we present the measurement of effects of the proposed techniques on SPEC CINT2000 benchmark suite and the analysis of the results. The result shows that our approach reduces the RSE cycles by 6.4% and total cpu cycles by 1.7% on average.