Algorithms in combinatorial geometry
Algorithms in combinatorial geometry
Computer graphics: principles and practice (2nd ed.)
Computer graphics: principles and practice (2nd ed.)
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences
Parallel Computing
The ADDAP system on the iPSC/860: automatic data distribution and parallelization
Journal of Parallel and Distributed Computing
Automatic data layout for distributed memory machines
Automatic data layout for distributed memory machines
A shared-memory implementation of the hierarchical radiosity method
Theoretical Computer Science - Special issue on parallel computing
Tools and techniques for automatic data layout: a case study
Parallel Computing - Special issues on languages and compilers for parallel computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Building programs in the network of tasks model
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 1
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers
Hi-index | 0.00 |
We consider parallel programs that are composed of a set of data-parallel modules. For an execution on a distributed memory machine (DMM), each data parallel module has to use a data distribution for its variables. If cooperating modules are based on different data distributions for the same variable, data redistributions have to be performed between the activations of the modules and thus additional time for communication is needed. In this paper, we assume that each of the modules is available in several different parallel realizations using different data distributions. We address the question how to select the realizations of the data-parallel modules that result in the smallest overall execution time of the entire program. We describe a cost-based method to determine data distributions for the different modules such that redistributions are taken into consideration. In particular, we concentrate on unbounded loops. Computation and communication costs as well as the costs for redistributions between cooperating modules are modelled by cost functions.