Future networking for scalable I/O

Authors:
H. Chen;J. Decker;N. Bierbaum
Affiliations:
Sandia National Laboratories, Livermore, CA;Sandia National Laboratories, Livermore, CA;Sandia National Laboratories, Livermore, CA
Venue:
PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
Year:
2006

Citing 6
Cited 0

TCP/IP illustrated (vol. 1): the protocols

TCP/IP illustrated (vol. 1): the protocols
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Parallel I/O for high performance computing

Parallel I/O for high performance computing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Sockets Direct Protocol over InfiniBand in clusters: is it beneficial?

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions

Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large clustered computers provide low-cost compute cycles, and therefore have promoted the development of sophisticated parallel-programming algorithms based on the Message Passing Interface. Storage platforms, however, fail to keep pace with similar advances. This paper compares standard 4X InfiniBand (IB) to 10- Gigabit Ethernet (GbE) for use as a common storage infrastructure in addition to message passing. Considering IB's native ability to accelerate protocol processing in hardware, the Ethernet hardware in this study provided similar acceleration using TCP Offload Engines. We evaluated their I/O performance using the IOZONE benchmark on the iSCSI-based TerraGRID parallel filesystem. Our evaluations show that 10GbE, with or without protocol-offload, offered better throughput and latency than IB to socket-based applications. Although protocol-offload in both 10GbE and IB demonstrated significant improvement in I/O performance, large amount of CPU are still being consumed to handle the associated data-copies and interrupts. The emerging RDMA technologies hold promises to remove the remaining CPU overhead. We plan to continue our study to research the applications of RDMA in parallel I/O.