Computer
Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
Hierarchical Interconnection Networks for Multicomputer Systems
IEEE Transactions on Computers
Efficient architectures for data access in a shared memory hierarchy
Journal of Parallel and Distributed Computing
The Stanford Dash Multiprocessor
Computer
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Designing interconnection networks for multi-level packaging
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
The Impact of Pipelined Channels on k-ary n-Cube Networks
IEEE Transactions on Parallel and Distributed Systems
The Impact of Wiring Constraints on Hierarchical Network Performance
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Interconnection network design based on packaging considerations
Interconnection network design based on packaging considerations
Macro-Star Networks: Efficient Low-Degree Alternatives to Star Graphs
IEEE Transactions on Parallel and Distributed Systems
VLSI layout and packaging of butterfly networks
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Turn Grouping for Multicast in Wormhole-Routed Mesh Networks Supporting the Turn Model
The Journal of Supercomputing
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems
IEEE Transactions on Parallel and Distributed Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Turn grouping for efficient multicast in wormhole mesh networks
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Multilayer VLSI Layout for Interconnection Networks
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
On the performance of multicomputer interconnection networks
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Clustered or hierarchical interconnections demonstrate advantage in designing large scale multiprocessor systems. Earlier studies in literature have either focused on only flat interconnections or proposed hierarchical/clustered interconnections with limited packaging and demanded performance constraints. Large systems require several levels of packaging. Packaging technologies impose various physical constraints on bisection bandwidth and channel width of a system. Pinout technologies and capacity of packaging modules have been ignored in earlier studies, often leading to configurations that are not design-feasible. Similarly, the impact of processor and interconnect technologies on demanded performance has also not been considered. In this paper, we propose a new supply-demand framework for multiprocessor system design by considering packaging, processor, and interconnect technologies in an integrated manner. The elegance of this framework lies in its parameterized representation of different technologies. For a given set of technological parameters the framework derives the best configuration while considering practical design aspects like maximum board area, maximum available pinout, fixed channel width, and scalability. In order to build a scalable parallel system with a given number of processors, the framework explores the design space of flat k-ary n-cube topologies and their clustered variations (k-ary n-cube cluster-c) to derive design-feasible configurations with best system performance. The study identifies processor board area, supported channel width, board pinout density, and router pinout as critical parameters and analyzes their impact on deriving design-feasible and best configurations. For a wide range of parameters, it is shown that best configurations are achieved with cluster-based systems with up to 8 processors per cluster and 3D-5D intercluster interconnection.