Analysis of a 3D toroidal network for a shared memory architecture
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A bridging model for parallel computation
Communications of the ACM
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Direct bulk-synchronous parallel algorithms
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
Resource Placement in Torus-Based Networks
IEEE Transactions on Computers
Lee Distance and Topological Properties of k-ary n-cubes
IEEE Transactions on Computers
Efficient communication using total-exchange
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
On Optimal Placements of Processors in Tori Networks
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Lower Bounds on Communication Loads and Optimal Placements in Torus Networks
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A layout for sparse cube-connected-cycles network
Proceedings of the 12th International Conference on Computer Systems and Technologies
Address-free all-to-all routing in sparse torus
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 14.98 |
Fully populated torus-connected networks, where every node has a processor attached, do not scale well since load on edges increases superlinearly with network size under heavy communication, resulting in a degradation in network throughput. In a partially populated network, processors occupy a subset of available nodes and a routing algorithm is specified among the processors placed. Analogous to multistage networks, it is desirable to have the total number of messages being routed through a particular edge in toroidal networks increase at most linearly with the size of the placement. To this end, we consider placements of processors which are described by a given placement algorithm parameterized by $k$ and $d$: We show formally, that to achieve linear communication load in a $d$-dimensional $k$-torus, the number of processors in the placement must be equal to $c k^{d-1}$ for some constant $c$. Our approach also gives a tighter lower bound than existing bounds for the maximum load of a placement for arbitrary number of dimensions for placements with sufficient symmetries. Based on these results, we give optimal placements and corresponding routing algorithms achieving linear communication load in tori with arbitrary number of dimensions.