Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Code scheduling and register allocation in large basic blocks
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Scheduling arithmetic and load operations in parallel with no spilling
SIAM Journal on Computing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Instruction scheduling for the IBM RISC System/6000 processor
IBM Journal of Research and Development
Scheduling time-critical instructions on RISC machines
POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Linear-time, optimal code scheduling for delayed-load architectures
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A retargetable compiler for ANSI C
ACM SIGPLAN Notices
Optimal scheduling of arithmetic operations in parallel with memory access (preliminary version)
POPL '85 Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The Generation of Optimal Code for Arithmetic Expressions
Journal of the ACM (JACM)
Postpass Code Optimization of Pipeline Constraints
ACM Transactions on Programming Languages and Systems (TOPLAS)
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Code generation and reorganization in the presence of pipeline constraints
POPL '82 Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Dynamic Programming Approach to Optimal Integrated Code Generation
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Scheduling expression trees for delayed-load architectures
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
A fast, optimal code-scheduling algorithm for processors with a delayed load of one instruction cycle is described. The algorithm minimizes both execution time and register use and runs in time proportional to the size of the expression-tree. An extension that spills registers when too few registers are available is also presented. The algorithm also performs very well for delayed loads of greater than one instruction cycle. A heuristic that schedules DAGs and is based on our optimal expression-tree-scheduling algorithm is presented and compared with Goodman and Hsu's algorithm Integrated Prepass Scheduling (IPS). Both schedulers perform well on benchmarks with small basic blocks, but on large basic blocks our scheduler outperforms IPS and is significantly faster.