Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Selected papers of the second workshop on Languages and compilers for parallel computing
Using profile information to assist classic code optimizations
Software—Practice & Experience
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
Modulo scheduling with multiple initiation intervals
Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
GURPR—a method for global software pipelining
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling
IEEE Transactions on Computers
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Register-Sensitive Software Pipelining
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)
Probabilistic Predicate-Aware Modulo Scheduling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Hi-index | 0.00 |
Software pipelining increases the loop execution throughput by overlapping the execution of successive iterations in a pipelined fashion. For loops with control flows, however, software pipelining is not straightforward because we need to consider the overlap of more than one execution path. Modulo scheduling simply transforms them into straightline loops through if-conversion which, in effect, achieves a fixed, worst-case initiation interval (II) among all paths. On the other hand, all-path pipelining (APP) and enhanced pipeline scheduling (EPS) can achieve a variable II depending on the path that is taken at execution time. Unfortunately, APP concentrates only on the overlap within the same path, entirely losing the overlap between different paths, whereas EPS attempts to overlap all paths together, failing to produce a tight schedule for each individual path, especially when resource constraints are tight. In this paper, we propose a new approach to EPS called split-path EPS (SP-EPS), which first splits each individual path via tail duplication and then performs EPS in a way to guarantee a tight schedule for each path, while producing a competitive cross-path schedule. We also extend SP-EPS to outer loops such that frequent paths that bypass the inner loop are split and then scheduled by SP-EPS. Our experimental results on nontrivial integer benchmarks show that SP-EPS can achieve as much as a geometric mean of 10 percent speedup over EPS when innermost loops are scheduled by SP-EPS, while it can achieve a geometric mean of 11.9 percent speedup when outer loops are also scheduled by SP-EPS.