The CM-5 Connection Machine: a scalable supercomputer
Communications of the ACM
EEL: machine-independent executable editing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Cost-Effective Parallel Computing
Computer
A Case for NOW (Networks of Workstations)
IEEE Micro
START-NG: Delivering Seamless Parallel Computing
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Scheduling Communication on an SMP Node Parallel Machine
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Message Proxies for Efficient, Protected Communication on SMP Clusters
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
Mechanisms for distributed shared memory
Mechanisms for distributed shared memory
Fine-grain protocol execution mechanisms and scheduling policies on smp clusters
Fine-grain protocol execution mechanisms and scheduling policies on smp clusters
Hi-index | 0.00 |
Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This protocol processing convention reduces communication latency and increases effective bandwidth, but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: the Fixed policy, which uses a dedicated protocol processor; and the Floating policy, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: (i) a dedicated protocol processor benefits light-weight protocols much more than heavy-weight protocols, (ii) a dedicated protocol processor is generally advantageous when there are four or more processors per node, (iii) multiprocessor node performance is not as sensitive to interrupt overhead as uniprocessor node because a message arrival is likely to find an idle processor on a multiprocessor node, thereby eliminating interrupts, (iv) the system with the lowest cost-performance will include a dedicated protocol processor when interrupt overheads are much higher than protocol weight-as in light-weight protocols.