Parallel copy motion

  • Authors:
  • Florent Bouchez;Quentin Colombet;Alain Darte;Fabrice Rastello;Christophe Guillon

  • Affiliations:
  • LIP, Lyon, France;LIP, Lyon, France;LIP, Lyon, France;LIP, Lyon, France;CEC compiler group, Grenoble, STMicroelectronics, France

  • Venue:
  • Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent results on the static single assignment (SSA) form open promising directions for the design of register allocation heuristics for just-in-time (JIT) compilation. In particular, tree-scan allocators with two decoupled phases, one for spilling and one for splitting/coloring/coalescing, seem good candidates for designing fast, memory-friendly, and competitive register allocators. Linear-scan allocators, introduced earlier, are also well-suited for JIT compilation. All do live-range splitting (mostly on control-flow edges) to avoid spilling but most of them perform coalescing poorly, leading to many register-to-register copies inside basic blocks, but also, implicitly, on the control-flow graph edges, leading to edge splitting. This paper presents parallel copy motion, a technique for optimizing register-allocated codes, which amounts to moving a group of parallel copy instructions from a program point to another. While the scheduling is shackled by data dependencies, a copy can "traverse" all instructions of a basic block, thanks to register renaming, except those with conflicting naming constraints. Also, with an adequate management of compensation code, parallel copies can also be moved across edges. A first application is reducing the cost of copies by a better placement. A second application is moving copies out of critical edges, i.e., edges going from a block with multiple successors to a block with multiple predecessors. This is often beneficial compared to the alternative: splitting the edge. A direct use case is the handling of control-flow graphs with non-splittable edges, introduced by some compilers for specific architectural constraints, region boundaries, or exception handling code. Experiments with the SPECint and our own benchmarks suite show that an SSA-based register allocator can be applied broadly now, even for procedures with non-splittable edges: while those procedures could not be compiled before, with parallel copy motion, all moves could be pushed out of such edges. Even simple strategies for moving copies out of edges and inside basic blocks show some average improvement compared to the standard edge-splitting strategy (3% speedup), with a great reduction of the weighted number of copies (21% move cost reduction for SPECint). This lets us believe that the approach is promising, and not only for improving coalescing in fast register allocators.