PVM: a framework for parallel distributed computing
Concurrency: Practice and Experience
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Design challenges of virtual networks: fast, general-purpose communication
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs
IEEE Parallel & Distributed Technology: Systems & Technology
The Least Choice First Scheduling Method for High-Speed Network Switche
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Utilizing Heterogeneous Networks in Distributed Parallel Computing Systems
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Hi-index | 0.00 |
An interconnect for a high-performance cluster has to be optimized in respect to both high throughput and low latency. To avoid the tradeoff between throughput and latency, the cluster interconnect Clint has a segregated architecture that provides two physically separate transmission channels: a bulk channel optimized for high-bandwidth traffic and a quick channel optimized for low-latency traffic. Different scheduling strategies are applied. The bulk channel uses a scheduler that globally allocates time slots on the transmission paths before packets are sent off. In this way, collisions as well as blockages are avoided. In contrast, the quick channel takes a best-effort approach by sending packets whenever they are available thereby risking collisions and retransmissions. Clint is targeted specifically at small- to medium-sized clusters offering a low-cost alternative to symmetric multiprocessor (SMP) systems. This design point allows for a simple and cost-effective implementation. In particular, by buffering packets only on the hosts and not requiring any buffer memory on the switches, protocols are simplified as switch forwarding delays are fixed, and throughput is optimized as the use of a global schedule is now possible.