A scalable and high performance software iSCSI implementation

Authors:
Abhijeet Joglekar;Michael E. Kounavis;Frank L. Berry
Affiliations:
Intel Research and Development, Hillsboro, OR;Intel Research and Development, Hillsboro, OR;Intel Research and Development, Hillsboro, OR
Venue:
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Year:
2005

Citing 17
Cited 3

The tea-leaf reader algorithm: an efficient implementation of CRC-16 and CRC-32

Communications of the ACM
Computation of cyclic redundancy checks via table look-up

Communications of the ACM
Architectural considerations for a new generation of protocols

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Fast software implementation of error detection codes

IEEE/ACM Transactions on Networking (TON)
A Tutorial on CRC Computations

IEEE Micro
Parallel CRC Generation

IEEE Micro
A Performance Analysis of the iSCSI Protocol

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
TCP Onloading for Data Center Servers

Computer
A Systematic Approach to Building High Performance Software-Based CRC Generators

ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications
Obtaining High Performance for Storage Outsourcing

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Making the Most Out of Direct-Access Network Attached Storage

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Storage Over IP: When Does Hardware Support Help?

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
A Performance Comparison of NFS and iSCSI for IP-Networked Storage

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Zero-copy TCP in Solaris

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
End system optimizations for high-speed TCP

IEEE Communications Magazine
Features of the iSCSI protocol

IEEE Communications Magazine
Performance study of iSCSI-based storage subsystems

IEEE Communications Magazine

A nine year study of file system and storage benchmarking

ACM Transactions on Storage (TOS)
Design and implementation of a field programmable CRC circuit architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Study on the data flow balance in NFS server with iSCSI

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present two novel techniques for improving the performance of the Internet Small Computer Systems Interface (iSCSI) protocol, which is the basis for IP-based networked block storage today. We demonstrate that by making a few modifications to an existing iSCSI implementation, it is possible to increase the iSCSI protocol processing throughput from 1.4 Gbps to 3.6 Gbps. Our solution scales with the CPU clock speed and can be easily implemented in software using any general purpose processor without requiring specialized iSCSI protocol processing hardware. To gain an in-depth understanding of the processing costs associated with an iSCSI protocol implementation, we built an iSCSI fast path in a user-level sandbox environment. We discovered that the generation of Cyclic Redundancy Codes (CRCs) which is required for data integrity, and the data copy operations which are required for the interaction between iSCSI and TCP represent the main bottlenecks in iSCSI protocol processing. We propose two optimizations to iSCSI implementations to address these bottlenecks. Our first optimization is on the way CRCs are being calculated. We replace the industry standard algorithm proposed by Prof. Dilip Sarwate with 'Slicing-by-8' (SB8), a new algorithm capable of ideally reading arbitrarily large amounts of data at a time while keeping its memory requirement at reasonable level. Our second optimization is on the way iSCSI interacts with the TCP layer. We interleave the compute-intensive data integrity checks with the memory access-intensive data copy operations to benefit from cache effects and hardware pipeline parallelism.