Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Analysis and simulation of a fair queueing algorithm
SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Hardware-Assisted Software Clock Synchronization for Homogeneous Distributed Systems
IEEE Transactions on Computers
Virtual clock: a new traffic control algorithm for packet switching networks
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
MPEG: a video compression standard for multimedia applications
Communications of the ACM - Special issue on digital multimedia systems
The Asynchronous Transfer Mode: a tutorial
Computer Networks and ISDN Systems - Special issue on the ATM—asynchronous transfer mode
Fundamental limits and tradeoffs of providing deterministic guarantees to VBR video traffic
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
MPI-FM: high performance MPI on workstation clusters
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Designing and Implementing High-Performance Media-on-Demand Servers
IEEE Parallel & Distributed Technology: Systems & Technology
TNet: A Reliable System Area Network
IEEE Micro
Real-Time Communication in Multihop Networks
IEEE Transactions on Parallel and Distributed Systems
Priority Based Real-Time Communication for Large Scale Wormhole Networks
Proceedings of the 8th International Symposium on Parallel Processing
Quality of Service Support in High Speed, Wormhole Routing Networks
ICNP '96 Proceedings of the 1996 International Conference on Network Protocols (ICNP '96)
Bandwidth and latency guarantees in low-cost, high-performance networks
Bandwidth and latency guarantees in low-cost, high-performance networks
QoS provisioning in clusters: an investigation of Router and NIC design
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
User-space communication: a quantitative study
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MediaWorm: A QoS Capable Router Architecture for Clusters
IEEE Transactions on Parallel and Distributed Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Performance analysis of a QoS capable cluster interconnect
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Design and Evaluation of an HPVM-Based Windows NT Supercomputer
International Journal of High Performance Computing Applications
Hi-index | 0.01 |
FM-QoS employs a novel communication architecture based on network feedback to provide predictable communication performance (e.g. deterministic latencies and guaranteed bandwidths) for high speed cluster interconnects. Network feedback is combined with self-synchronizing communication schedules to achieve synchrony in the network interfaces (NIs). Based on this synchrony, the network can be scheduled to provide predictable performance without special network QoS hardware. We describe the key element of the FM-QoS approach, feedback-based synchronization (FBS), which exploits network feedback to synchronize senders. We use Petri nets to characterize the set of self-synchronizing communication schedules for which FBS is effective and to describe the resulting synchronization overhead as a function of the clock drift across the network nodes. Analytic modeling suggests that for clocks of quality 300 ppm (such as found in the Myrinet NI), a synchronization overhead less than 1% of the total communication traffic is achievable --- significantly better than previous software-based schemes and comparable to hardware-intensive approaches such as virtual circuits (e.g. ATM).We have built a prototype of FBS for Myricom s Myrinet network (a 1.28 Gbps cluster network) which demonstrates the viability of the approach by sharing network resources with predictable performance. The prototype, which implements the local node schedule in software, achieves predictable latencies of 23 µs for a single-switch, 8-node network and 2 KB packets. In comparison, the best-effort scheme achieves 104 µs for the same network without FBS. While this ratio of over four to one already demonstrates the viability of the approach, it includes nearly 10 µs of overhead due to the software implementation. For hardware implementations of local node scheduling, and for networks with cascaded switches, these ratios should be much larger factors.