Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Intensive hypercube communication. Prearranged communication in link-bound machines
Journal of Parallel and Distributed Computing
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computer Networks and ISDN Systems - Special issue on high speed networks
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
The IBM external user interface for scalable parallel systems
Parallel Computing - Special issue: message passing interfaces
Efficient algorithms for all-to-all communications in multi-port message-passing systems
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Designing broadcasting algorithms in the Postal Model for message-passing systems
Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
Optimal computation of census functions in the postal model
Discrete Applied Mathematics
Architecture and Implementation of Vulcan
Proceedings of the 8th International Symposium on Parallel Processing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
Proceedings of the 8th International Symposium on Parallel Processing
Document for a Standard Message-Passing Interface
Document for a Standard Message-Passing Interface
IEEE Transactions on Parallel and Distributed Systems
Modeling parallel bandwidth: local vs. global restrictions
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
Bandwidth optimal all-reduce algorithms for clusters of workstations
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multiport postal model. This model is characterized by three parameters: n驴the number of processors, k驴the number of ports per processor, and 驴驴the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent from k other processors 驴驴 1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messages.