The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Stencils and problem partitionings: their influence on the performance of multiple processor systems
IEEE Transactions on Computers
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Introduction to algorithms
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Integration, the VLSI Journal
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
LogP: a practical model of parallel computation
Communications of the ACM
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Optimizing memory usage in the polyhedral model
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Quantifying the Multi-Level Nature of Tiling Interactions
International Journal of Parallel Programming
Reuse-Driven Tiling for Improving Data Locality
International Journal of Parallel Programming
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Hierarchical tiling for improved superscalar performance
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Quantifying the Multi-level Nature of Tiling Interactions
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Determining the Idle Time of a Tiling: New Results
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Automatic Partitioning of Signal Processing Programs for Symmetric Multiprocessors
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Combining Optimization for Cache and Instruction-Level Parallelism
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Predicting performance for tiled perfectly nested loops
Predicting performance for tiled perfectly nested loops
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
Parallel Dynamic Programming on Clusters of Workstations
IEEE Transactions on Parallel and Distributed Systems
A New Approach to Parallelization of Serial Nested Loops Using Genetic Algorithms
The Journal of Supercomputing
Optimizing code parallelization through a constraint network based approach
Proceedings of the 43rd annual Design Automation Conference
A New Genetic Algorithm for Loop Tiling
The Journal of Supercomputing
Message-passing code generation for non-rectangular tiling transformations
Parallel Computing
Distributed algorithms for partially clairvoyant dispatchers
Cluster Computing
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Code scheduling for optimizing parallelism and data locality
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Selecting the tile shape to reduce the total communication volume
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
A Case Study of Implementing Supernode Transformations
International Journal of Parallel Programming
Hi-index | 0.00 |
Many computationally-intensive programs, such as those for differential equations, spatial interpolation, and dynamic programming, spend a large portion of their execution time in multiply-nested loops that have a regular stencil of data dependences. Tiling is a well-known compiler optimization that improves performance on such loops, particularly for computers with a multileveled hierarchy of parallelism and memory. Most previous work on tiling is limited in at least one of the following ways: they only handle nested loops of depth two, orthogonal tiling, or rectangular tiles. In our work, we tile loop nests of arbitrary depth using polyhedral tiles. We derive a prediction formula for the execution time of such tiled loops, which can be used by a compiler to automatically determine the tiling parameters that minimizes the execution time. We also explain the notion of rise, a measure of the relationship between the shape of the tiles and the shape of the iteration space generated by the loop nest. The rise is a powerful tool in predicting the execution time of a tiled loop. It allows us to reason about how the tiling affects the length of the longest path of dependent tiles, which is a measure of the execution time of a tiling. We use a model of the tiled iteration space that allows us to determine the length of the longest path of dependent tiles using linear programming. Using the rise, we derive a simple formula for the length of the longest path of dependent tiles in rectilinear iteration spaces, a subclass of the convex iteration spaces, and show how to choose the optimal tile shape.