Computation of cyclic redundancy checks via table look-up
Communications of the ACM
Parallel accelerated isocontouring for out-of-core visualization
PVGS '99 Proceedings of the 1999 IEEE symposium on Parallel visualization and graphics
When the CRC and TCP checksum disagree
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Parallel view-dependent isosurface extraction using multi-pass occlusion culling
PVG '01 Proceedings of the IEEE 2001 symposium on parallel and large-data visualization and graphics
Distributed processing of very large datasets with DataCutter
Parallel Computing - Clusters and computational grids for scientific computing
Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks
IEEE Transactions on Computers
The Quadrics Network (QsNet): High-Performance Clustering Technology
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Performance Characterization of a 10-Gigabit Ethernet TOE
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
On the performance of TCP splicing for URL-aware redirection
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Hi-index | 0.00 |
Due to the growing need to tolerate network faults and congestion in high-end computing systems, supporting multiple network communication paths is becoming increasingly important. However, multi-path communication comes with the disadvantage of out-of-order arrival of packets (because packets may traverse different paths). While modern networking stacks such as the Internet Wide-Area RDMA Protocol (iWARP) over 10-Gigabit Ethernet (10GE) support multi-path communication, their current implementations do not handle out-of-order packets primarily owing to the overhead on in-order communication that it adds. Specifically, in iWARP, supporting out-of-order packets requires every packet to carry additional information causing significant overhead on packets that arrive in-order. Thus, in this paper, we analyze the trade-offs in designing a feature-complete iWARP stack, i.e., one that provides support for out-of-order arriving packets, and thus, multi-path systems, while focusing on the performance of in-order communication. We propose three feature-complete designs of iWARP and analyze the pros and cons of each of these designs using performance experiments based on several micro-benchmarks as well as an iso-surface visual rendering application. Our analysis reveals that the iWARP design providing the best overall performance depends on the particular characteristics of the upper layers and that different designs are optimal based on the metric of interest.