QR factorization of a dense matrix on a hypercube multiprocessor
SIAM Journal on Scientific and Statistical Computing
The high performance Fortran handbook
The high performance Fortran handbook
Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
Managing multiple communication methods in high-performance networked computing systems
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Wormhole routing techniques for directly connected multicomputer systems
ACM Computing Surveys (CSUR)
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
IEEE Transactions on Computers
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Generalized Multipartitioning for Multi-Dimensional Arrays
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Impact of Physical/Logical Network Topology on Parallel Matrix Computation
International Journal of High Performance Computing Applications
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
We propose a technique to optimize the performance of applications using distributed dense arrays and characterized by a nearest-neighbor communication profile by exploiting the topology of SMP clusters. The topological information is used to map array tiles to processors to reduce network communication and improve utilization of shared memory for inter-process communication. The potential benefits of using the SMP-aware mapping are demonstrated through a simulation, as well as a real application solving a wind-driven ocean circulation model on an IBM SP. On 256 processors, the execution time was reduced by almost 30 percent without any changes to the original application source code. The proposed mapping approach is applicable to multiple programming models and distributed array management systems.