pSNOW: a tool to evaluate architectural issues for NOW environments
ICS '97 Proceedings of the 11th international conference on Supercomputing
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Responsiveness without interrupts
ICS '99 Proceedings of the 13th international conference on Supercomputing
ACM Transactions on Computer Systems (TOCS)
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
Cost effectiveness of an adaptable computing cluster
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
An Analysis of the Cost Effectiveness of an Adaptable Computing Cluster
Cluster Computing
Distributed Shared Arrays: An Integration of Message Passing and Multithreading on SMP Clusters
The Journal of Supercomputing
Efficient Direct User Level Sockets for an Intel® Xeon" Processor Based TCP On-Load Engine
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs
Journal of Parallel and Distributed Computing
Scheduler implementation in MP SoC design
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Hi-index | 0.00 |
Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This scheduling convention reduces communication latency and increases effective bandwidth, but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: Fixed, which uses a dedicated protocol processor; and Floating, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: i) a dedicated protocol processor benefits light-weight protocols much more than heavy-weight protocols; ii) Fixed improves performance over Floating when communication becomes the bottleneck, which is more likely when the application is very communication-intensive, overheads are very high, or there are multiple (i.e., more than two) processors per node; iii) a system with optimal cost-effectiveness is likely to include a dedicated protocol processor, at least for light-weight protocols.