Message-passing code generation for non-rectangular tiling transformations

Authors:
Georgios Goumas;Nikolaos Drosinos;Maria Athanasaki;Nectarios Koziris
Affiliations:
School of Electrical and Computer Engineering, National Technical University of Athens, Computing Systems Laboratory, Zografou Campus, 15773 Zografou, Greece;School of Electrical and Computer Engineering, National Technical University of Athens, Computing Systems Laboratory, Zografou Campus, 15773 Zografou, Greece;School of Electrical and Computer Engineering, National Technical University of Athens, Computing Systems Laboratory, Zografou Campus, 15773 Zografou, Greece;School of Electrical and Computer Engineering, National Technical University of Athens, Computing Systems Laboratory, Zografou Campus, 15773 Zografou, Greece
Venue:
Parallel Computing
Year:
2006

Citing 25
Cited 1

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Independent Partitioning of Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Numerical solution of partial differential equations

Numerical solution of partial differential equations
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
On the Removal of Anti- and Output-Dependences

International Journal of Parallel Programming
Parallel programming: techniques and applications using networked workstations and parallel computers

Parallel programming: techniques and applications using networked workstations and parallel computers
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Optimal scheduling for UET/VET-UCT generalized n-dimensional grid task graphs

Journal of Parallel and Distributed Computing
Generating efficient tiled code for distributed memory machines

Parallel Computing
Time-minimal tiling when rise is larger than zero

Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time

IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Programming in Vienna Fortran

Scientific Programming
An efficient code generation technique for tiled iteration spaces

IEEE Transactions on Parallel and Distributed Systems

Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tiling is a well known loop transformation used to reduce communication overhead in distributed memory machines. Although a lot of theoretical research has been done concerning the selection of proper tile shapes that reduce processor idle times, there is no complete approach to automatically parallelize non-rectangularly tiled iteration spaces and consequently there are no actual experimental results to verify previous theoretical work on the effect of the tile shape on the overall completion time of a tiled algorithm. This paper presents a complete end-to-end framework to generate automatic message-passing code for tiled iteration spaces. It considers general parallelepiped tiling transformations and convex iteration spaces. We aim to address all problems concerning data parallel code generation efficiently by transforming the initial non-rectangular tile to a rectangular one. In this way, data distribution and the respective communication pattern become simple and straightforward. We have implemented our parallelizing techniques in a tool which automatically generates MPI code and run several benchmarks on a cluster of PCs. Our experimental results show the merit of general parallelepiped tiling transformations, and verify previous theoretical work on scheduling-optimal, non-rectangular tile shapes.