Distribution Assignment Placement: Effective Optimization of Redistribution Costs

Authors:
Jens Knoop;Eduard Mehofer
Affiliations:
Univ. of Dortmund, Dortmund, Germany;Univ. of Vienna, Vienna, Austria
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2002

Citing 28
Cited 9

Global value numbers and redundant computations

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
LAPACK's user's guide

LAPACK's user's guide
Lazy code motion

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Interprocedural compilation of Fortran D for MIMD distributed-memory machines

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An elimination algorithm for bidirectional data flow problems using edge placement

ACM Transactions on Programming Languages and Systems (TOPLAS)
Partial dead code elimination

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimal code motion: theory and practice

ACM Transactions on Programming Languages and Systems (TOPLAS)
The power of assignment motion

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural partial redundancy elimination and its application to distributed memory compilation

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Automatic data layout for distributed memory machines

Automatic data layout for distributed memory machines
Optimizations for efficient array redistribution on distributed memory multicomputers

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimal compilation of HPF remappings

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
ScaLAPACK user's guide

ScaLAPACK user's guide
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Advanced compiler design and implementation

Advanced compiler design and implementation
Buffer-safe and cost-driven communication optimization

Journal of Parallel and Distributed Computing
Flow Analysis of Computer Programs

Flow Analysis of Computer Programs
Efficient Algorithms for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Evaluation of High Performance Fortran Through Application Kernels

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Multi-phase array redistribution: modeling and evaluation

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Interprocedural Array Redistribution Data-Flow Analysis

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
A Unified Data-Flow Framework for Optimizing Communication

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Property-Oriented Expansion

SAS '96 Proceedings of the Third International Symposium on Static Analysis
Non-monotone Fixpoint Iterations to Resolve Second Order Effects

CC '96 Proceedings of the 6th International Conference on Compiler Construction
Invariance of Approximate Semantics with Respect to Program Transformations

GI - 11. Jahrestagung in Verbindung mit Third Conference of the European Co-operation in Informatics (ECI)
Optimal Distribution Assignment Placement

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Interprocedural Distribution Assignment Placement: More Than Just Enhancing Intraprocedural Placing Techniques

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Installation Guide and Design of the HPF 1.1 interface toScaLAPACK, SLHPF

Installation Guide and Design of the HPF 1.1 interface toScaLAPACK, SLHPF

Lazy code motion

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Optimizing for space and time usage with speculative partial redundancy elimination

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

The Journal of Supercomputing
A lifetime optimal algorithm for speculative PRE

ACM Transactions on Architecture and Code Optimization (TACO)
Optimizing communications of data parallel programs in scalable cluster systems

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Localization techniques for cluster-based data grid

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Localized communications of data parallel programs on multi-cluster grid systems

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Optimizations of data distribution localities in cluster grid environments

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
Efficient selection strategies towards processor reordering techniques for improving data locality in heterogeneous clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data locality and workload balance are key factors for getting high performance out of data-parallel programs on multiprocessor architectures. Data-parallel languages such as High-Performance Fortran (HPF) thus offer means allowing a programmer both to specify data distributions, as well as to change them dynamically in order to maintain these properties. On the other hand, redistributions can be quite expensive and significantly degrade a program's performance. They must thus be reduced to a minimum. In this article, we present a novel, aggressive approach for avoiding unnecessary remappings which works by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations achieving each on its own optimal results. In distinction to the sequential setting, the data-parallel setting leads naturally to a family of algorithms of varying power and efficiency allowing requirement-customized solutions. The power and flexibility of the new approach are demonstrated by various examples, which range from typical HPF fragments to real world programs. Performance measurements underline its importance and show its effectivity on different hardware platforms and different settings.