Practical parallel algorithms for personalized communication and integer sorting
Journal of Experimental Algorithmics (JEA)
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Parallel construction of multidimensional binary search trees
ICS '96 Proceedings of the 10th international conference on Supercomputing
Practical Algorithms for Selection on Coarse-Grained Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Scalable S-To-P Broadcasting on Message-Passing MPPs
IEEE Transactions on Parallel and Distributed Systems
Parallel Construction of Multidimensional Binary Search Trees
IEEE Transactions on Parallel and Distributed Systems
Effects of communication characteristics on task mapping quality on a 2-D mesh with wormhole routing
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Parallel hierarchical solvers and preconditioners for boundary element methods
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers
The Journal of Supercomputing
Efficient Parallel Algorithms for Solvent Accessible Surface Area of Proteins
IEEE Transactions on Parallel and Distributed Systems
Portable and scalable algorithm for irregular all-to-all communication
Journal of Parallel and Distributed Computing
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Software Techniques for Improving MPP Bulk-Transfer Performance
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Practical Algorithms for Selection on Coarse-Grained Parallel Computers
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A3: a simple and asymptotically accurate model for parallel computation
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Efficient Multiple Multicast on Heterogeneous Network of Workstations
The Journal of Supercomputing
Research note: Parallel algorithms for tree accumulations
Journal of Parallel and Distributed Computing
Exchanging messages of different sizes
Journal of Parallel and Distributed Computing
PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
A dominant input stream for LUD incremental computing on a contention network
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Hi-index | 0.00 |
This paper presents solutions for the problem of many-to-many personalized communication, with bounded incoming and outgoing traffic, on a distributed memory parallel machine. We present a two-stage algorithm that decomposes the many-to-many communication with possibly high variance in message size into two communications with low message size variance. The algorithm is deterministic and takes time 2t/spl mu/(+lower order terms) when t/spl ges/0(p/sup 2/+p/spl tau///spl mu/) Here t is the maximum outgoing or incoming traffic at any processor, /spl tau/ is the startup overhead and /spl mu/ is the inverse of the data transfer rate. Optimality is achieved when the traffic is large, a condition that is usually satisfied in practice on coarse-grained architectures. The algorithm was implemented on the Connection Machine CM-5. The implementation used the low latency communication primitives (active messages) available on the CM-5, but the algorithm as such is architecture-independent. An alternate single-stage algorithm using distributed random scheduling for the CM-5 was implemented and the performance of the two algorithms were compared.