Combined partitioning and data padding for scheduling multiple loop nests
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
A Model-Based Framework: An Approach for Profit-Driven Optimization
Proceedings of the international symposium on Code generation and optimization
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A New Genetic Algorithm for Loop Tiling
The Journal of Supercomputing
An approach toward profit-driven optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Microarchitecture Sensitive Empirical Models for Compiler Optimizations
Proceedings of the International Symposium on Code Generation and Optimization
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Automatic creation of tile size selection models
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
DMATiler: revisiting loop tiling for direct memory access
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The authors address the problem of estimating the performance of loop tiling, an important program transformation for improved memory hierarchy utilization. We introduce an analytical model for estimating the memory cost of a loop nest as a rational polynomial in tile size variables. We also present a constant-time algorithm for finding an optimal solution to the model (i.e., for selecting optimal tile sizes) for the case of doubly nested loops. This solution can be applied to tiling of three loops by performing an iterative search on the value of the first tile size variable, and using the constant-time algorithm at each point in the search to obtain optimal tile size values for the remaining two loops. Our solution is efficient enough to be used in production-quality optimizing compilers, and has been implemented in the IBM XL Fortran product compilers. This solution can also be used by processor designers to efficiently predict the performance of a set of tiled loops for a range of memory hierarchy parameters.