Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Low-level vision on warp and the apply programming model
Parallel computation and computers for artificial intelligence
Compile-time techniques for efficient utilization of parallel memories
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Machine-independent image processing: performance of apply on diverse architectures
Computer Vision, Graphics, and Image Processing
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A parallelizing compiler for distributed memory parallel computers
A parallelizing compiler for distributed memory parallel computers
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Automatic data mapping for distributed-memory parallel computers
Automatic data mapping for distributed-memory parallel computers
Connection Machine Lisp: fine-grained parallel symbolic processing
LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
An architecture-independent model for parallel programming
An architecture-independent model for parallel programming
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Toward automatic partitioning of arrays on distributed memory computers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
A novel approach towards automatic data distribution
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Deriving Array Distributions by Optimization Techniques
The Journal of Supercomputing
Compiler optimization of dynamic data distributions for distributed-memory multicomputers
Compiler optimizations for scalable parallel systems
A systematic approach to synthesize data alignment directives for distributed memory machines
Nordic Journal of Computing
A General Data Layout for Distributed Consistency in Data Parallel Applications
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing
The Journal of Supercomputing
Memetic algorithms for parallel code optimization
International Journal of Parallel Programming
Hi-index | 0.00 |
The performance of a program on a distributed-memory parallel computer depends on the algorithms employed, the structure and speed of the machine's communication network, and the ways in which data are distributed to the processors. This paper addresses the last of these concerns, the problem of data mapping.The paper describes and evaluated a system which automatically determines efficient ways of mapping data onto processors. The system is applicable and effective across a variety of architectures. Simulation results for machine with different interconnection schemes, including linear arrays, two-dimensional meshes, and the hypercubes, and measured running times for the CM-2 show that good data mapping often improves performance by at least 20% and in some cases by more than a factor of two.