Interconnection networks for large-scale parallel processing: theory and case studies
Interconnection networks for large-scale parallel processing: theory and case studies
Efficient synchronization of multiprocessors with shared memory
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Faster optimal parallel prefix sums and list ranking
Information and Computation
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
Cube structures for multiprocessors
Communications of the ACM
Journal of the ACM (JACM)
Dynamic task scheduling for irregular network topologies
Parallel Computing - Heterogeneous computing
Hi-index | 0.01 |
This paper presents a family of algorithms for producing, from ( upsilon /sub 0/, upsilon/sub 1/, ..., upsilon /sub n-1/), all initial prefixes x/sub i/= upsilon /sub 0/ theta upsilon/sub 1/ theta ... theta upsilon /sub i/ (i=0, 1, ..., n-1) in parallel in interconnectionnetworks such as the omega network and the hypercube, where theta is an associativebinary operator. Each algorithm can be embedded in the switches and interconnections ofthe network, and can be executed in O((log/sub 2/ r+1) log/sub r/ n) time steps providedthat the network connecting n processors is constructed by using an r*r switch, and thatparallelism within as well as among individual switches is exploited. The objective of these algorithms is to attain a communication pattern that fits the topology of the network. One type of network can be made equivalent to, or can be embedded in, another type of network, so a family of algorithms can be derived from one basic algorithm. In the basic algorithm, every processor p/sub i/ upward multicasts upsilon /sub i/ to processors p/sub k/ (k=i+1, i+2, ..., n - 1). En route to p/sub i/, upsilon /sub j/ (j=0, 1, ..., i - 1) are combined in the switches to produce the (i - 1)th initial prefix x/sub i-1/ that is received by p/sub i/, which can then compute the ith initial prefix x/sub i/=x/sub i-1/ theta upsilon /sub i/.