Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles

Authors:
Fabrice Rastello;Yves Robert
Affiliations:
ST Microelectronics, Grenoble, France;Institute Nationale de Recherche en Informatique et en Automatique (INRIA), Lyon, France
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2002

Citing 14
Cited 7

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tiling nested loops into maximal rectangular blocks

Journal of Parallel and Distributed Computing
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations

IEEE Transactions on Parallel and Distributed Systems
Minimizing Data and Synchronization Costs in One-Way Communication

IEEE Transactions on Parallel and Distributed Systems
Data Relation Vectors: A New Abstraction for Data Optimizations

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Precise Tiling for Uniform Loop Nests

ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops

Efficient Tiling for an ODE Discrete Integration Program: Redundant Tasks Instead of Trapezoidal Shaped-Tiles

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A New Genetic Algorithm for Loop Tiling

The Journal of Supercomputing
Positivity, posynomials and tile size selection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallelizing query optimization

Proceedings of the VLDB Endowment
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Parallel loop generation and scheduling

The Journal of Supercomputing
Loop parallelization in multi-dimensional cartesian space

PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper, an efficient algorithm to implement loop partitioning is introduced and evaluated. We start from results of Agarwal et al. whose aim is to minimize the number of accessed data throughout the computation of a tile; this number is called the cumulative footprint of the tile. We improve these results along several directions. First, we derive a new formulation of the cumulative footprint, allowing for an analytical solution of the optimization problem stated in. Second, we deal with arbitrary parallelepiped-shaped tiles, as opposed to rectangular tiles in. We design an efficient heuristic to determine the optimal tile shape in this general setting and we show its usefulness using both examples from and a large collection of randomly generated data.