Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Code scheduling and register allocation in large basic blocks
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Scheduling expressions on a pipelined processor with a maximal delay of one cycle
ACM Transactions on Programming Languages and Systems (TOPLAS)
Spill code minimization techniques for optimizing compliers
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Coloring heuristics for register allocation
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Scheduling arithmetic and load operations in parallel with no spilling
SIAM Journal on Computing
Integrating register allocation and instruction scheduling for RISCs
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Linear-time, optimal code scheduling for delayed-load architectures
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A dynamic-programming technique for compacting loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The Fork95 parallel programming language: design, implementation, application
International Journal of Parallel Programming
The Generation of Optimal Code for Arithmetic Expressions
Journal of the ACM (JACM)
Optimal Code Generation for Expression Trees
Journal of the ACM (JACM)
Postpass Code Optimization of Pipeline Constraints
ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation via usage counts
Communications of the ACM
Register allocation by priority-based coloring
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Realization of PRAMs: Processor Design
WDAG '94 Proceedings of the 8th International Workshop on Distributed Algorithms
A Randomized Heuristic Approach to Register Allocation
PLILP '91 Proceedings of the 3rd International Symposium on Programming Language Implementation and Logic Programming
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Generating optimal contiguous evaluations for expression DAGs
Computer Languages
Register allocation via coloring
Computer Languages
Optimal instruction scheduling using integer programming
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A Dynamic Programming Approach to Optimal Integrated Code Generation
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Optimal integrated code generation for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Fast Optimal Instruction Scheduling for Single-Issue Processors with Arbitrary Latencies
CP '01 Proceedings of the 7th International Conference on Principles and Practice of Constraint Programming
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Generating schedules for expression DAGs that use a minimal number of registers is a classical NP-complete optimization problem. Up to now an exact solution could only be computed for small DAGs (with up to 20 nodes), using a trivial O(n!) enumeration algorithm. We present a new algorithm with worst-case complexity O(n2^2^n) and very good average behavior. Applying a dynamic programming scheme and reordering techniques, our algorithm is able to defer the combinatorial explosion and to generate an optimal schedule not only for small DAGs but also for medium-sized ones with up to 50 nodes, a class that contains nearly all DAGs encountered in typical application programs. Experiments with randomly generated DAGs and large DAGs from real application programs confirm that the new algorithm generates optimal schedules quite fast. We extend our algorithm to cope with delay slots and multiple functional units, two common features of modern superscalar processors.