Code scheduling and register allocation in large basic blocks
ICS '88 Proceedings of the 2nd international conference on Supercomputing
A recursive technique for computing lower-bound performance of schedules
ACM Transactions on Design Automation of Electronic Systems (TODAES)
IEEE Transactions on Computers
URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Treegion Scheduling for Wide Issue Processors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Optimal Superblock Scheduling Using Enumeration
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register saturation in instruction level parallelism
International Journal of Parallel Programming
Optimal global instruction scheduling using enumeration
Optimal global instruction scheduling using enumeration
Subroutine profiling results for the CPU2006 benchmarks
ACM SIGARCH Computer Architecture News
Optimal versus Heuristic Global Code Scheduling
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Optimal trace scheduling using enumeration
ACM Transactions on Architecture and Code Optimization (TACO)
Engineering A Compiler
Constraint programming techniques for optimal instruction scheduling
Constraint programming techniques for optimal instruction scheduling
Scheduling expression DAGs for minimal register need
Computer Languages
Lower-bound performance estimation for the high-level synthesis scheduling problem
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Optimal and heuristic global code motion for minimal spilling
CC'13 Proceedings of the 22nd international conference on Compiler Construction
Hi-index | 0.00 |
Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic techniques have been proposed to solve this problem. However, due to the inherently conflicting requirements of maximizing ILP and minimizing register pressure, heuristic techniques may produce poor schedules in many cases. If such cases occur in hot code, significant performance degradation may result. A few combinatorial optimization approaches have also been proposed, but none of them has been shown to solve large real-world instances within reasonable time. This article presents the first combinatorial algorithm that is efficient enough to optimally solve large instances of this problem (basic blocks with hundreds of instructions) within a few seconds per instance. The proposed algorithm uses branch-and-bound enumeration with a number of powerful pruning techniques to efficiently search the solution space. The search is based on a cost function that incorporates schedule length and register pressure. An implementation of the proposed scheduling algorithm has been integrated into the LLVM Compiler and evaluated using SPEC CPU 2006. On x86-64, with a time limit of 10ms per instruction, it optimally schedules 79% of the hot basic blocks in FP2006. Another 19% of the blocks are not optimally scheduled but are improved in cost relative to LLVM's heuristic. This improves the execution time of some benchmarks by up to 21%, with a geometric-mean improvement of 2.4% across the entire benchmark suite. With the use of precise latency information, the geometric-mean improvement is increased to 2.8%.