Observing TCP dynamics in real networks
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
The importance of non-data touching processing overheads in TCP/IP
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Architectural Characterization of TCP/IP Packet Processing on the Pentium® M Microprocessor
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Hardware Support for Bulk Data Movement in Server Platforms
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Integrated network interfaces for high-bandwidth TCP/IP
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
TCP offload is a dumb idea whose time has come
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Impact of Cache Coherence Protocols on the Processing of Network Traffic
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Receive side coalescing for accelerating TCP/IP processing
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
On the impact of bursting on TCP performance
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
End system optimizations for high-speed TCP
IEEE Communications Magazine
Hi-index | 0.00 |
Scaling TCP/IP receive side processing to 10Gbps speeds on commercialserver platforms has been a major challenge. This led to the development oftwo key techniques: Large Receive Offload (LRO) and Direct Cache Access(DCA). Only recently, systems supporting these two techniques have becomeavailable. So, we want to evaluate these two techniques using 10Gigabit NICs tofind out if we can finally get 10Gbps rates. We evaluate these two techniques indetail to understand performance benefit offered by these two techniques and theremaining major overheads. Our measurements showed that LRO and DCA togetherimprove TCP/IP receive performance by more than 50% over the base case(no LRO and DCA). These two techniques combined with the improvements inthe CPU architecture and the rest of the platform over the last 3-4 years have morethan doubled the TCP/IP receive processing throughput to 7Gbps. Our detailedarchitectural characterization of TCP/IP processing, with these two features enabled,has revealed that buffer management and copy operations still take up significantamount of processing time. We also analyze the scaling behavior ofTCP/IP to figure out how multi-core architectures improve network processing.This part of our analysis has highlighted some limiting factors that need to be addressedto achieve scaling beyond 10Gbps.