Parallel copy motion

Authors:
Florent Bouchez;Quentin Colombet;Alain Darte;Fabrice Rastello;Christophe Guillon
Affiliations:
LIP, Lyon, France;LIP, Lyon, France;LIP, Lyon, France;LIP, Lyon, France;CEC compiler group, Grenoble, STMicroelectronics, France
Venue:
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Year:
2010

Citing 23
Cited 4

The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation via hierarchical graph coloring

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Branch prediction for free

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Register allocation via graph coloring

Register allocation via graph coloring
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterated register coalescing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Quality and speed in linear-scan register allocation

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Building an optimizing compiler

Building an optimizing compiler
Linear scan register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Fusion-based register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal spilling for CISC machines with few registers

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Linear Scan Register Allocation in the Context of SSA Form and Register Constraints

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Register allocation by priority-based coloring

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Optimized interval splitting in a linear scan register allocator

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
Tilting at Windmills with Coq: Formal Verification of a Compilation Algorithm for Parallel Moves

Journal of Automated Reasoning
Copy coalescing by graph recoloring

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Register allocation: what does the NP-completeness proof of Chaitin et al. really prove? or revisiting register allocation: why and how

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Extended linear scan: an alternate foundation for global register allocation

CC'07 Proceedings of the 16th international conference on Compiler construction
Register allocation for programs in SSA-Form

CC'06 Proceedings of the 15th international conference on Compiler Construction

Decoupled graph-coloring register allocation with hierarchical aliasing

Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
Copy elimination on data dependence graphs

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Elimination of parallel copies using code motion on data dependence graphs

Computer Languages, Systems and Structures
Hardware acceleration for programs in SSA form

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent results on the static single assignment (SSA) form open promising directions for the design of register allocation heuristics for just-in-time (JIT) compilation. In particular, tree-scan allocators with two decoupled phases, one for spilling and one for splitting/coloring/coalescing, seem good candidates for designing fast, memory-friendly, and competitive register allocators. Linear-scan allocators, introduced earlier, are also well-suited for JIT compilation. All do live-range splitting (mostly on control-flow edges) to avoid spilling but most of them perform coalescing poorly, leading to many register-to-register copies inside basic blocks, but also, implicitly, on the control-flow graph edges, leading to edge splitting. This paper presents parallel copy motion, a technique for optimizing register-allocated codes, which amounts to moving a group of parallel copy instructions from a program point to another. While the scheduling is shackled by data dependencies, a copy can "traverse" all instructions of a basic block, thanks to register renaming, except those with conflicting naming constraints. Also, with an adequate management of compensation code, parallel copies can also be moved across edges. A first application is reducing the cost of copies by a better placement. A second application is moving copies out of critical edges, i.e., edges going from a block with multiple successors to a block with multiple predecessors. This is often beneficial compared to the alternative: splitting the edge. A direct use case is the handling of control-flow graphs with non-splittable edges, introduced by some compilers for specific architectural constraints, region boundaries, or exception handling code. Experiments with the SPECint and our own benchmarks suite show that an SSA-based register allocator can be applied broadly now, even for procedures with non-splittable edges: while those procedures could not be compiled before, with parallel copy motion, all moves could be pushed out of such edges. Even simple strategies for moving copies out of edges and inside basic blocks show some average improvement compared to the standard edge-splitting strategy (3% speedup), with a great reduction of the weighted number of copies (21% move cost reduction for SPECint). This lets us believe that the approach is promising, and not only for improving coalescing in fast register allocators.