Synchronization without contention
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Evaluating virtual channels for cache-coherent shared-memory multiprocessors
ICS '96 Proceedings of the 10th international conference on Supercomputing
The Performance of the Cedar Multistage Switching Network
IEEE Transactions on Parallel and Distributed Systems
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Performance benefits of virtual channels and adaptive routing: an application-driven study
ICS '97 Proceedings of the 11th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Symmetric Crossbar Arbiters for VLSI Communication Switches
IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
A New Solution to Coherence Problems in Multicache Systems
IEEE Transactions on Computers
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
IEEE Transactions on Computers
HIPIQS: A High-Performance Switch Architecture Using Input Queuing
IEEE Transactions on Parallel and Distributed Systems
Design and analysis of static memory management policies for CC-NUMA Multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
In this paper, the effect of switch design on the application peformance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole routing and cut-through switching are evaluated for these shared-memory multiprocessors that employ multistage interconnection network (MIN) and full map directory-based cache coherence protocol. The switch design also considers virtual channels and varying number of input buffers per switch. Based on this, four different switch architectures are presented and compared. The evaluation is based on execution-driven simulation using five different applications to capture the random bursty nature of the network trafic arrival. The round-robin memory management policy is implemented. We show that the use of cut-through switching with buffers and virtual channels improves the average message latency tremendously. The waiting times of messages at various stages of switches are also presented. Finally, we show the variation of stall times and execution times for these applications by varying the switch delay and wire width.