Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On the parallel implementation of Goldberg's maximum flow algorithm
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of adaptive wormhole routing algorithms
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
The interaction between virtual channel flow control and adaptive routing in wormhole networks
ICS '94 Proceedings of the 8th international conference on Supercomputing
An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Adaptive Deadlock- and Livelock-Free Routing with All Minimal Paths in Torus Networks
IEEE Transactions on Parallel and Distributed Systems
Software overhead in messaging layers: where does the time go?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Planar-adaptive routing: low-cost adaptive networks for multiprocessors
Journal of the ACM (JACM)
IBM Systems Journal
Evaluating virtual channels for cache-coherent shared-memory multiprocessors
ICS '96 Proceedings of the 10th international conference on Supercomputing
PP-MESS-SIM: A Flexible and Extensible Simulator for Evaluating Multicomputer Networks
IEEE Transactions on Parallel and Distributed Systems
Introduction to process-oriented simulation and CSIM (tutorial session)
WSC' 90 Proceedings of the 22nd conference on Winter simulation
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems
IEEE Transactions on Parallel and Distributed Systems
How Much Does Network Contention Affect Distributed Shared Memory Performance?
ICPP '97 Proceedings of the international Conference on Parallel Processing
Accuracy vs. performance in parallel simulation of interconnection networks
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Towards a Communication Characterization Methodology for Parallel Applications
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Modeling Latency in Deterministic Wormhole-Routed Hypercubes under Hot-Spot Traffic
The Journal of Supercomputing
Analytical Modelling of Hot-Spot Traffic in Deterministically-Routed K-Ary N-Cubes
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Future Generation Computer Systems
Hi-index | 0.00 |
Research on multiprocessor interconnection networks has primarily focused on wormhole switching, virtual channel flow control, and routing algorithms to enhance their performance. The rationale behind this research is that by alleviating the network latency for high network loads, the overall system performance would improve. Many studies have used synthetic workloads to support this claim. However, such workloads may not necessarily capture the behavior of real applications. In this paper, we have used parallel applications for a closer examination of the network behavior. In particular, the performance benefit from enhancing a 2D mesh with virtual channels (VCs) and a fully adaptive routing algorithm is examined with a set of shared-memory and message passing applications. Execution time and average message latency of shared memory applications are measured using execution-driven simulation and by varying many architectural attributes that affect the network workload. The communication traces of message passing applications, collected on an IBM-SP2, are used to run a trace-driven simulation of the mesh architecture to obtain message latency. Simulation results show that VCs and adaptive routing can reduce the network latency to varying degrees depending on the application. However, these modest benefits do not translate to significant improvements in the overall execution time because the load on the network is not high enough to exploit the advantages of the network enhancements. Moreover, this benefit may be negated if the architectural enhancements increase the network cycle time. Rather, emphasis should be placed on improving the raw network bandwidth and faster network interfaces.