Global value numbers and redundant computations
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
LAPACK's user's guide
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Interprocedural compilation of Fortran D for MIMD distributed-memory machines
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An elimination algorithm for bidirectional data flow problems using edge placement
ACM Transactions on Programming Languages and Systems (TOPLAS)
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimal code motion: theory and practice
ACM Transactions on Programming Languages and Systems (TOPLAS)
The power of assignment motion
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural partial redundancy elimination and its application to distributed memory compilation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Automatic data layout for distributed memory machines
Automatic data layout for distributed memory machines
Optimizations for efficient array redistribution on distributed memory multicomputers
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimal compilation of HPF remappings
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
ScaLAPACK user's guide
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Advanced compiler design and implementation
Advanced compiler design and implementation
Buffer-safe and cost-driven communication optimization
Journal of Parallel and Distributed Computing
Flow Analysis of Computer Programs
Flow Analysis of Computer Programs
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Evaluation of High Performance Fortran Through Application Kernels
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Multi-phase array redistribution: modeling and evaluation
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Interprocedural Array Redistribution Data-Flow Analysis
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
A Unified Data-Flow Framework for Optimizing Communication
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
SAS '96 Proceedings of the Third International Symposium on Static Analysis
Non-monotone Fixpoint Iterations to Resolve Second Order Effects
CC '96 Proceedings of the 6th International Conference on Compiler Construction
Invariance of Approximate Semantics with Respect to Program Transformations
GI - 11. Jahrestagung in Verbindung mit Third Conference of the European Co-operation in Informatics (ECI)
Optimal Distribution Assignment Placement
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Installation Guide and Design of the HPF 1.1 interface toScaLAPACK, SLHPF
Installation Guide and Design of the HPF 1.1 interface toScaLAPACK, SLHPF
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Optimizing for space and time usage with speculative partial redundancy elimination
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines
The Journal of Supercomputing
A lifetime optimal algorithm for speculative PRE
ACM Transactions on Architecture and Code Optimization (TACO)
Optimizing communications of data parallel programs in scalable cluster systems
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Localization techniques for cluster-based data grid
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Localized communications of data parallel programs on multi-cluster grid systems
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Optimizations of data distribution localities in cluster grid environments
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
Hi-index | 0.01 |
Data locality and workload balance are key factors for getting high performance out of data-parallel programs on multiprocessor architectures. Data-parallel languages such as High-Performance Fortran (HPF) thus offer means allowing a programmer both to specify data distributions, as well as to change them dynamically in order to maintain these properties. On the other hand, redistributions can be quite expensive and significantly degrade a program's performance. They must thus be reduced to a minimum. In this article, we present a novel, aggressive approach for avoiding unnecessary remappings which works by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations achieving each on its own optimal results. In distinction to the sequential setting, the data-parallel setting leads naturally to a family of algorithms of varying power and efficiency allowing requirement-customized solutions. The power and flexibility of the new approach are demonstrated by various examples, which range from typical HPF fragments to real world programs. Performance measurements underline its importance and show its effectivity on different hardware platforms and different settings.