A Geometric Programming Framework for Optimal Multi-Level Tiling

Authors:
Lakshminarayanan Renganarayana;Sanjay Rajopadhye
Affiliations:
Colorado State University;Colorado State University
Venue:
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Year:
2004

Citing 32
Cited 9

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A bridging model for parallel computation

Communications of the ACM
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
MOB forms: a class of multilevel block algorithms for dense linear algebra operations

ICS '94 Proceedings of the 8th international conference on Supercomputing
Cache interference phenomena

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An infeasible interior-point algorithm for solving primal and dual geometric programs

Mathematical Programming: Series A and B - Special issue: interior point methods in theory and practice
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
A tile selection algorithm for data locality and cache interference

ICS '99 Proceedings of the 13th international conference on Supercomputing
Quantifying the multi-level nature of tiling interactions

International Journal of Parallel Programming
Analytical Modeling of Set-Associative Cache Behavior

IEEE Transactions on Computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop tiling for parallelism

Loop tiling for parallelism
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops
Convex Optimization

Convex Optimization
An analytical model for loop tiling and its solution

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Optimal semi-oblique tiling

IEEE Transactions on Parallel and Distributed Systems
A cost-effective implementation of multilevel tiling

IEEE Transactions on Parallel and Distributed Systems

Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Positivity, posynomials and tile size selection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Exploring parallelization strategies for NUFFT data translation

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Parameterized tiling revisited

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Scalable parallelization strategies to accelerate NuFFT data translation on multicores

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Combined ILP and register tiling: analytical model and optimization framework

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Determining the optimal tile size-one that minimizes the execution time-is a classical problem in compilation and performance tuning of loop kernels. Designing a model of the overall execution time of a tiled loop nest is an important subproblem. Both problems become harder when tiling is applied at multiple levels. We present a framework for determining the optimal tile sizes for a fully permutable, perfectly nested, rectangular loop with uniform dependences. Our framework supports multiple levels of tiling and uses a BSP style high level model for estimating the overall execution time of a loop program. In our framework, the problem of determining the optimal tile sizes, subject to memory capacity and bandwidth constraints, is modeled as a geometric program and transformed into a convex optimization problem, which can be solved efficiently. The model is validated through experimental results obtained by running twenty loop programs for different levels of tiling and different program and tile parameters. Our framework is very general and can also be used to solve the optimal tile size problem with many other models of execution time.