A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for kr → r and r → kr Array Redistribution1
The Journal of Supercomputing
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets
IEEE Transactions on Parallel and Distributed Systems
Coordinating HPF programs to mix task and data parallelism
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 1
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Efficient Methods for Multi-Dimensional Array Redistribution
The Journal of Supercomputing
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Double standards: bringing task parallelism to HPF via the message passing interface
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Compiler optimization of dynamic data distributions for distributed-memory multicomputers
Compiler optimizations for scalable parallel systems
A Generalized Processor Mapping Technique for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient algorithms for block-cyclic array redistribution between processor sets
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Block-cyclic redistribution over heterogeneous networks
Cluster Computing
Mixed data and task parallelism with HPF and PVM
Cluster Computing
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
PACK/UNPACK on Coarse-Grained Distributed Memory Parallel Machines
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Mapping Functions and Data Redistribution for Parallel Files
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Compiling MATLAB Programs to ScaLAPACK: Exploiting Task and Data Parallelism
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Efficient Method for kr-r and r-kr Arrary Redistribution
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
Exploiting Advanced Task Parallelism in High Performance Fortran via a Task Library
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Coordination Layer for Exploiting Task Parallelism with HPF
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
The Globus Striped GridFTP Framework and Server
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A pipeline technique for dynamic data transfer on a multiprocessor grid
International Journal of Parallel Programming
The Journal of Supercomputing
pMatlab Parallel Matlab Library
International Journal of High Performance Computing Applications
A flexible processor mapping technique toward data localization for block-cyclic data redistribution
The Journal of Supercomputing
A message passing strategy for array redistributions in a torus network
The Journal of Supercomputing
Mapping functions and data redistribution for parallel files
The Journal of Supercomputing
Hi-index | 0.00 |
Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We present a new mathematical representation for regular distributions called PITFALLS and then discuss algorithms for redistribution based on this representation. A significant contribution of this work is the ability to handle arbitrary source and target processor sets while performing redistribution; another is the ability to handle arbitrary dimensionality for the array being redistributed in a sealable manner. The results presented show low overheads for our redistribution algorithm as compared to naive runtime methods.