The tea-leaf reader algorithm: an efficient implementation of CRC-16 and CRC-32
Communications of the ACM
Computation of cyclic redundancy checks via table look-up
Communications of the ACM
Architectural considerations for a new generation of protocols
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Fast software implementation of error detection codes
IEEE/ACM Transactions on Networking (TON)
A Tutorial on CRC Computations
IEEE Micro
IEEE Micro
A Performance Analysis of the iSCSI Protocol
MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
A Systematic Approach to Building High Performance Software-Based CRC Generators
ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications
Obtaining High Performance for Storage Outsourcing
FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Making the Most Out of Direct-Access Network Attached Storage
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Storage Over IP: When Does Hardware Support Help?
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
A Performance Comparison of NFS and iSCSI for IP-Networked Storage
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
End system optimizations for high-speed TCP
IEEE Communications Magazine
Features of the iSCSI protocol
IEEE Communications Magazine
Performance study of iSCSI-based storage subsystems
IEEE Communications Magazine
A nine year study of file system and storage benchmarking
ACM Transactions on Storage (TOS)
Design and implementation of a field programmable CRC circuit architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Study on the data flow balance in NFS server with iSCSI
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
In this paper we present two novel techniques for improving the performance of the Internet Small Computer Systems Interface (iSCSI) protocol, which is the basis for IP-based networked block storage today. We demonstrate that by making a few modifications to an existing iSCSI implementation, it is possible to increase the iSCSI protocol processing throughput from 1.4 Gbps to 3.6 Gbps. Our solution scales with the CPU clock speed and can be easily implemented in software using any general purpose processor without requiring specialized iSCSI protocol processing hardware. To gain an in-depth understanding of the processing costs associated with an iSCSI protocol implementation, we built an iSCSI fast path in a user-level sandbox environment. We discovered that the generation of Cyclic Redundancy Codes (CRCs) which is required for data integrity, and the data copy operations which are required for the interaction between iSCSI and TCP represent the main bottlenecks in iSCSI protocol processing. We propose two optimizations to iSCSI implementations to address these bottlenecks. Our first optimization is on the way CRCs are being calculated. We replace the industry standard algorithm proposed by Prof. Dilip Sarwate with 'Slicing-by-8' (SB8), a new algorithm capable of ideally reading arbitrarily large amounts of data at a time while keeping its memory requirement at reasonable level. Our second optimization is on the way iSCSI interacts with the TCP layer. We interleave the compute-intensive data integrity checks with the memory access-intensive data copy operations to benefit from cache effects and hardware pipeline parallelism.