Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Fine-grain parallelization and the wavefront method
Selected papers of the second workshop on Languages and compilers for parallel computing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
The parallel execution of DO loops
Communications of the ACM
Constructing and exploiting linear schedules with prescribed parallelism
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Constructive Methods for Scheduling Uniform Loop Nests
IEEE Transactions on Parallel and Distributed Systems
Efficient Pipelining of Nested Loops: Unroll-and-Squash
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimal Software Pipelining of Nested Loops
Proceedings of the 8th International Symposium on Parallel Processing
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops
CC '96 Proceedings of the 6th International Conference on Compiler Construction
Improving Software Pipelining With Unroll-and-Jam
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Register allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
REDEFINE: Runtime reconfigurable polymorphic ASIC
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline a loop nest at an arbitrary loop level that has a rectangular iteration space and contains no sibling inner loops in it. The first step identifies the most profitable loop level for software pipelining in terms of initiation rate, data reuse potential, or any other optimization criteria. The second step simplifies the multidimensional data-dependence graph (DDG) of the selected loop level into a one-dimensional DDG and constructs a one-dimensional (1D) schedule. Based on the one-dimensional schedule, the third step derives a simple mapping function that specifies the schedule time for the operation instances in the multidimensional loop. The classical modulo scheduling is subsumed by SSP as a special case. SSP is also closely related to hyperplane scheduling, and, in fact, extends it to be resource constrained. We prove that SSP schedules are correct and at least as efficient as those schedules generated by traditional modulo scheduling methods. We extend SSP to schedule imperfect loop nests, which are most common at the instruction level. Multiple initiation intervals are naturally allowed to improve execution efficiency. Feasibility and correctness of our approach are verified by a prototype implementation in the ORC compiler for the IA-64 architecture, tested with loop nests from Livermore and SPEC2000 floating-point benchmarks. Preliminary experimental results reveal that, compared to modulo scheduling, software pipelining at an appropriate loop level results in significant performance improvement. Software pipelining is beneficial even with prior loop transformations.