Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An infeasible interior-point algorithm for solving primal and dual geometric programs
Mathematical Programming: Series A and B - Special issue: interior point methods in theory and practice
Quantifying the multi-level nature of tiling interactions
International Journal of Parallel Programming
Loop tiling for parallelism
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
Optimized Unrolling of Nested Loops
International Journal of Parallel Programming
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal Software Pipelining of Nested Loops
Proceedings of the 8th International Symposium on Parallel Processing
Hierarchical tiling for improved superscalar performance
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
An experimental evaluation of scalar replacement on scientific benchmarks
Software—Practice & Experience
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Convex Optimization
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
IEEE Transactions on Parallel and Distributed Systems
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hi-index | 0.00 |
Efficient use of multiple pipelined functional units and registers is very important for achieving high performance on modern processors. Instruction Level Parallelism (ILP) and register reuse (through register tiling) are two mechanisms for this, respectively. Program transformations that expose and exploit ILP and register reuse interact with each other in subtle ways. We study the combined problem of optimal ILP and register reuse. We consider the class of uniform dependence, fully permutable, rectangular loop nests. We develop an analytical model of the combined problem and formulate a mathematical optimization problem that chooses the parameters of the ILP-exposing transformation and register tiling so as to minimize the total execution time. We distinguish two cases: when loop permutation can and cannot expose a parallel loop. We show that the combined problem can be reduced to a single integer convex optimization problem for the former case, and to a set of integer convex optimization problems for the latter case, both of which can be solved to global optimality.