POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A bridging model for parallel computation
Communications of the ACM
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
MOB forms: a class of multilevel block algorithms for dense linear algebra operations
ICS '94 Proceedings of the 8th international conference on Supercomputing
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
ICS '96 Proceedings of the 10th international conference on Supercomputing
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An infeasible interior-point algorithm for solving primal and dual geometric programs
Mathematical Programming: Series A and B - Special issue: interior point methods in theory and practice
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
A tile selection algorithm for data locality and cache interference
ICS '99 Proceedings of the 13th international conference on Supercomputing
Quantifying the multi-level nature of tiling interactions
International Journal of Parallel Programming
Analytical Modeling of Set-Associative Cache Behavior
IEEE Transactions on Computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop tiling for parallelism
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Automatic Blocking of Nested Loops
Automatic Blocking of Nested Loops
Convex Optimization
An analytical model for loop tiling and its solution
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
IEEE Transactions on Parallel and Distributed Systems
A cost-effective implementation of multilevel tiling
IEEE Transactions on Parallel and Distributed Systems
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parametric multi-level tiling of imperfectly nested loops
Proceedings of the 23rd international conference on Supercomputing
Exploring parallelization strategies for NUFFT data translation
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Parameterized tiling revisited
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Scalable parallelization strategies to accelerate NuFFT data translation on multicores
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Combined ILP and register tiling: analytical model and optimization framework
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms
Journal of Signal Processing Systems
Hi-index | 0.00 |
Determining the optimal tile size-one that minimizes the execution time-is a classical problem in compilation and performance tuning of loop kernels. Designing a model of the overall execution time of a tiled loop nest is an important subproblem. Both problems become harder when tiling is applied at multiple levels. We present a framework for determining the optimal tile sizes for a fully permutable, perfectly nested, rectangular loop with uniform dependences. Our framework supports multiple levels of tiling and uses a BSP style high level model for estimating the overall execution time of a loop program. In our framework, the problem of determining the optimal tile sizes, subject to memory capacity and bandwidth constraints, is modeled as a geometric program and transformed into a convex optimization problem, which can be solved efficiently. The model is validated through experimental results obtained by running twenty loop programs for different levels of tiling and different program and tile parameters. Our framework is very general and can also be used to solve the optimal tile size problem with many other models of execution time.