Combined ILP and register tiling: analytical model and optimization framework

Authors:
Lakshminarayanan Renganarayana;U. Ramakrishna;Sanjay Rajopadhye
Affiliations:
Computer Science Department, Colorado State University;Computer Science Department, Colorado State University;Computer Science Department, Colorado State University
Venue:
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Year:
2005

Citing 22
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining

ACM Computing Surveys (CSUR)
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An infeasible interior-point algorithm for solving primal and dual geometric programs

Mathematical Programming: Series A and B - Special issue: interior point methods in theory and practice
Quantifying the multi-level nature of tiling interactions

International Journal of Parallel Programming
Loop tiling for parallelism

Loop tiling for parallelism
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Scheduling and Automatic Parallelization

Scheduling and Automatic Parallelization
Optimized Unrolling of Nested Loops

International Journal of Parallel Programming
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal Software Pipelining of Nested Loops

Proceedings of the 8th International Symposium on Parallel Processing
Hierarchical tiling for improved superscalar performance

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Convex Optimization

Convex Optimization
A Geometric Programming Framework for Optimal Multi-Level Tiling

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Register allocation for software pipelined multi-dimensional loops

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Optimal semi-oblique tiling

IEEE Transactions on Parallel and Distributed Systems

Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient use of multiple pipelined functional units and registers is very important for achieving high performance on modern processors. Instruction Level Parallelism (ILP) and register reuse (through register tiling) are two mechanisms for this, respectively. Program transformations that expose and exploit ILP and register reuse interact with each other in subtle ways. We study the combined problem of optimal ILP and register reuse. We consider the class of uniform dependence, fully permutable, rectangular loop nests. We develop an analytical model of the combined problem and formulate a mathematical optimization problem that chooses the parameters of the ILP-exposing transformation and register tiling so as to minimize the total execution time. We distinguish two cases: when loop permutation can and cannot expose a parallel loop. We show that the combined problem can be reduced to a single integer convex optimization problem for the former case, and to a set of integer convex optimization problems for the latter case, both of which can be solved to global optimality.