Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures

Authors:
R. Govindarajan;Hongbo Yang;José Nelson Amaral;Chihong Zhang;Guang R. Gao
Affiliations:
-;-;-;-;-
Venue:
IEEE Transactions on Computers
Year:
2003

Citing 32
Cited 11

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Efficient instruction scheduling for a pipelined architecture

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Code scheduling and register allocation in large basic blocks

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Instruction scheduling for the IBM RISC System/6000 processor

IBM Journal of Research and Development
The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Integrating register allocation and instruction scheduling for RISCs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Linear-time, optimal code scheduling for delayed-load architectures

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Rematerialization

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register allocation with instruction scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Improvements to graph coloring register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation over the program dependence graph

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Spill-free parallel scheduling of basic blocks

Proceedings of the 28th annual international symposium on Microarchitecture
Iterated register coalescing

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal software pipelining with function unit and register constraints

Optimal software pipelining with function unit and register constraints
Optimal and near-optimal global register allocations using 0–1 integer programming

Software—Practice & Experience
Quality and speed in linear-scan register allocation

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Advanced compiler design and implementation

Advanced compiler design and implementation
Compiler-controlled memory

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Generation of Optimal Code for Arithmetic Expressions

Journal of the ACM (JACM)
Code Generation for a One-Register Machine

Journal of the ACM (JACM)
Linear scan register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal spilling for CISC machines with few registers

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Advanced Topics in Dataflow Computing and Multithreading

Advanced Topics in Dataflow Computing and Multithreading
The Design of an Optimizing Compiler

The Design of an Optimizing Compiler
Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Scheduling Expression DAGs for Minimal Register Need

PLILP '96 Proceedings of the 8th International Symposium on Programming Languages: Implementations, Logics, and Programs
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
A New Framework for Integrated Global Local Scheduling

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Evaluating Register Allocation and Instruction Scheduling Techniques in Out-Of-Order Issue Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Combining Register Allocation and Instruction Scheduling

Combining Register Allocation and Instruction Scheduling

Data-Dependency Graph Transformations for Instruction Scheduling

Journal of Scheduling
Register saturation in instruction level parallelism

International Journal of Parallel Programming
Prematerialization: reducing register pressure for free

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Tetris: a new register pressure control technique for VLIW processors

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Periodic register saturation in innermost loops

Parallel Computing
Tetris-XL: A performance-driven spill reduction technique for embedded VLIW processors

ACM Transactions on Architecture and Code Optimization (TACO)
Fine-grain stacked register allocation for the itanium architecture

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Performance characterization of the 64-bit x86 architecture from compiler optimizations' perspective

CC'06 Proceedings of the 15th international conference on Compiler Construction
Optimal and heuristic global code motion for minimal spilling

CC'13 Proceedings of the 22nd international conference on Compiler Construction
Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

ACM Transactions on Architecture and Code Optimization (TACO)
Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing

International Journal of Parallel Programming

Quantified Score

Hi-index	14.98

Visualization

Abstract

In this paper, we address the problem of generating an optimal instruction sequence S for a Directed Acyclic Graph (DAG), where S is optimal in terms of the number of registers used. We call this the Minimum Register Instruction Sequence (MRIS) problem. The motivation for revisiting the MRIS problem stems from several modern architecture innovations/requirements that has put the instruction sequencing problem in a new context. We develop an efficient heuristic solution for the MRIS problem. This solution is based on the notion of instruction lineage驴a set of instructions that can definitely share a single register. The formation of lineages exploits the structure of the dependence graph to facilitate the sharing of registers not only among instructions within a lineage, but also across lineages. Our efficient heuristics to 驴fuse驴 lineages further reduce the register requirement. This reduced register requirement results in generating a code sequence with fewer register spills. We have implemented our solution in the MIPSpro production compiler and measured its performance on the SPEC95 floating point benchmark suite. Our experimental results demonstrate that the proposed instruction sequencing method significantly reduces the number of spill loads and stores inserted in the code, by more than 50 percent in each of the benchmarks. Our approach reduces the average number of dynamic loads and stores executed by 10.4 percent and 3.7 percent, respectively. Further, our approach improves the execution time of the benchmarks on an average by 3.2 percent. In order to evaluate how efficiently our heuristics find a near-optimal solution to the MRIS problem, we develop an elegant integer linear programming formulation for the MRIS problem. Using a commercial integer linear programming solver, we obtain the optimal solution for the MRIS problem. Comparing the optimal solution from the integer linear programming tool with our heuristic solution reveals that, in a very large majority (99.2 percent) of the cases, our heuristic solution is optimal. For this experiment, we used a set of 675 dependence graphs representing basic blocks extracted from scientific benchmark programs.