Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
Performance modeling of distributed memory architectures
Journal of Parallel and Distributed Computing
Automatic data mapping for distributed-memory parallel computers
Automatic data mapping for distributed-memory parallel computers
Automatic data mapping for distributed-memory parallel computers
ICS '92 Proceedings of the 6th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
Concurrent scientific computing
Concurrent scientific computing
A novel approach towards automatic data distribution
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing
IEEE Transactions on Parallel and Distributed Systems
Deriving structured parallel implementations for numerical methods
Microprocessing and Microprogramming - Special double issue: parallel systems engineering
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences
Parallel Computing
The ADDAP system on the iPSC/860: automatic data distribution and parallelization
Journal of Parallel and Distributed Computing
Introduction to Parallel Computing
Introduction to Parallel Computing
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Comparing Task and Data Parallel Execution Schemes for the DIIRK Method
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Modeling the Communication Behavior of the Intel Paragon
MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers
Orthogonal Processor Groups for Message-Passing Programs
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Efficiency of Thread-Parallel Java Programs from Scientific Computing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Set Operations for Orthogonal Processor Groups
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing
The Journal of Supercomputing
Memetic algorithms for parallel code optimization
International Journal of Parallel Programming
Hi-index | 0.00 |
The paper presents a new method to derive data distributions for parallel computers with distributed memory organization by a mathematical optimization technique. Prerequisites for this approach are a parameterized data distribution and a rigorous performance prediction technique that allows us to derive runtime formulas containing the parameters of the data distribution. A mathematical optimization technique can then be used to determine the parameters in such a way that the total runtime is minimized, thus also minimizing the communication overhead and the load imbalance penalty. The method is demonstrated by using it to determine a data distribution for the LU decomposition of a matrix.