The priority-based coloring approach to register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation via hierarchical graph coloring
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Register allocation via graph coloring
Register allocation via graph coloring
Avoidance and suppression of compensation code in a trace scheduling compiler
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Quality and speed in linear-scan register allocation
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Building an optimizing compiler
Building an optimizing compiler
Linear scan register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Fusion-based register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal spilling for CISC machines with few registers
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Linear Scan Register Allocation in the Context of SSA Form and Register Constraints
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Register allocation by priority-based coloring
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Optimized interval splitting in a linear scan register allocator
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Tilting at Windmills with Coq: Formal Verification of a Compilation Algorithm for Parallel Moves
Journal of Automated Reasoning
Copy coalescing by graph recoloring
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Extended linear scan: an alternate foundation for global register allocation
CC'07 Proceedings of the 16th international conference on Compiler construction
Register allocation for programs in SSA-Form
CC'06 Proceedings of the 15th international conference on Compiler Construction
Decoupled graph-coloring register allocation with hierarchical aliasing
Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
Copy elimination on data dependence graphs
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Elimination of parallel copies using code motion on data dependence graphs
Computer Languages, Systems and Structures
Hardware acceleration for programs in SSA form
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
Recent results on the static single assignment (SSA) form open promising directions for the design of register allocation heuristics for just-in-time (JIT) compilation. In particular, tree-scan allocators with two decoupled phases, one for spilling and one for splitting/coloring/coalescing, seem good candidates for designing fast, memory-friendly, and competitive register allocators. Linear-scan allocators, introduced earlier, are also well-suited for JIT compilation. All do live-range splitting (mostly on control-flow edges) to avoid spilling but most of them perform coalescing poorly, leading to many register-to-register copies inside basic blocks, but also, implicitly, on the control-flow graph edges, leading to edge splitting. This paper presents parallel copy motion, a technique for optimizing register-allocated codes, which amounts to moving a group of parallel copy instructions from a program point to another. While the scheduling is shackled by data dependencies, a copy can "traverse" all instructions of a basic block, thanks to register renaming, except those with conflicting naming constraints. Also, with an adequate management of compensation code, parallel copies can also be moved across edges. A first application is reducing the cost of copies by a better placement. A second application is moving copies out of critical edges, i.e., edges going from a block with multiple successors to a block with multiple predecessors. This is often beneficial compared to the alternative: splitting the edge. A direct use case is the handling of control-flow graphs with non-splittable edges, introduced by some compilers for specific architectural constraints, region boundaries, or exception handling code. Experiments with the SPECint and our own benchmarks suite show that an SSA-based register allocator can be applied broadly now, even for procedures with non-splittable edges: while those procedures could not be compiled before, with parallel copy motion, all moves could be pushed out of such edges. Even simple strategies for moving copies out of edges and inside basic blocks show some average improvement compared to the standard edge-splitting strategy (3% speedup), with a great reduction of the weighted number of copies (21% move cost reduction for SPECint). This lets us believe that the approach is promising, and not only for improving coalescing in fast register allocators.