Communications of the ACM
Cascaded rhombic crossbar interconnection networks
Journal of Parallel and Distributed Computing
A dynamically segmented bus architecture
Computers and Electrical Engineering
Comparative performance of overlapping connectivity multiprocessor interconnection networks
The Computer Journal - Special issue on information systems
Overlapping connectivity interconnection networks for shared memory multiprocessor systems
Journal of Parallel and Distributed Computing
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On Crossbar Switch and Multiple Bus Interconnection Networks with Overlapping Connectivity
IEEE Transactions on Computers
Reflective interconnection networks
Computers and Electrical Engineering
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Scalable Shared-Memory Multiprocessing
Scalable Shared-Memory Multiprocessing
Hypercube Multiprocessors with Bus Connections for Improving Communication Performance
IEEE Transactions on Computers
Parallel applications and architectures utilizing bi-directional rhombic interconnection networks with circular and segmented buses
High-performance computer architecture and algorithm simulator
Journal on Educational Resources in Computing (JERIC)
Hi-index | 0.00 |
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create overlapping groups of processors, buses, and memories, forming a clustered computer architecture where the clusters overlap. This overlapping-membership characteristic is shown to be useful for matching parallel application communication topology to the architecture's bandwidth characteristics. Many parallel applications can be mapped to the architecture topology so that most or all communication is localized within an overlapping cluster, at the low latency of processor direct to cache (or memory) over a bus. The latency of communication between parallel threads does not degrade parallel performance or limit the graininess of applications. Parallel applications can execute with good speedup and scaling on a proposed architecture which is designed to obtain maximum advantage from the overlapping-cluster characteristic, and also allows dynamic workload migration without moving the instructions or data. Scalability limitations of bus-based shared-memory multiprocessors are overcome by judicious workload allocation schemes, that take advantage of the overlapping-cluster memberships. Bus-based rhombic shared-memory multiprocessors are examined in terms of parallel speedup models to explain their advantages and justify their use as a foundation for the proposed computer architecture. Interconnection bandwidth is maximized with bi-directional circular and segmented overlapping buses. Strategies for mapping parallel application communication topologies to rhombic architectures are developed. Analytical models of enhanced rhombic multiprocessor performance are developed with a unique bandwidth modeling technique, and are compared with the results of simulation.