On Supernode Transformation with Minimized Total Running Time

Authors:
Edin Hodzic;Weijia Shang
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 14
Cited 20

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Independent Partitioning of Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Parallel computing (2nd ed.): theory and practice

Parallel computing (2nd ed.): theory and practice
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Performance Prediction of PVM Programs

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
On Supernode Transformation with Minimized Total Running Time

ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
(R) On Optimal Size and Shape of Supernode Transformations

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops

Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Optimal semi-oblique tiling

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Time-minimal tiling when rise is larger than zero

Parallel Computing
On Time Optimal Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
Enhancing the Performance of Tiled Loop Execution onto Clusters Using Memory Mapped Network Interfaces and Pipelined Schedules

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Improving Scheduling of Tasks in a Heterogeneous Environment

IEEE Transactions on Parallel and Distributed Systems
A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping

Journal of Parallel and Distributed Computing
Automatic parallel code generation for tiled nested loops

Proceedings of the 2004 ACM symposium on Applied computing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
Message-passing code generation for non-rectangular tiling transformations

Parallel Computing
Positivity, posynomials and tile size selection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Efficient hybrid parallelisation of tiled algorithms on SMP clusters

International Journal of Computational Science and Engineering
Tiling optimization in numerically solving a multidimensional heat equation on a ring of processors

Cybernetics and Systems Analysis
Selecting the tile shape to reduce the total communication volume

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On supernode transformations and multithreading for the longest common subsequence problem

AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
A Case Study of Implementing Supernode Transformations

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, for sufficiently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose real positive roots include the optimal supernode size. For two special cases, 1) two-dimensional algorithm problems and 2) n-dimensional algorithm problems, where the communication cost is dominated by the startup penalty and, therefore, can be approximated by a constant, we give a closed form expression for the optimal supernode size, which is independent of the supernode relative side lengths and cutting hyperplanes. For the case where the algorithm iteration index space and the supernodes are hyperrectangular, we give closed form expressions for the optimal supernode relative side lengths. Our experiment shows a good match of the closed form expressions with experimental data.