CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
A QoS-enabled packet scheduling algorithm for IPSec multi-accelerator based systems
Proceedings of the 2nd conference on Computing frontiers
Comparing Ethernet and Myrinet for MPI communication
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
TCP offload through connection handoff
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Hardware Support for Accelerating Data Movement in Server Platform
IEEE Transactions on Computers
The Journal of Supercomputing
Optimizing network virtualization in Xen
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Characterization of network processing overheads in Xen
VTDC '06 Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing
Optimizing TCP receive performance
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Analysis of the Effects of XLFrames in a Network
NETWORKING '09 Proceedings of the 8th International IFIP-TC 6 Networking Conference
From packets to XLFrames: sand and rocks for transfer of mice and elephants
INFOCOM'09 Proceedings of the 28th IEEE international conference on Computer Communications Workshops
Achieving 10Gbps network processing: are we there yet?
HiPC'08 Proceedings of the 15th international conference on High performance computing
A new TCB cache to efficiently manage TCP sessions for web servers
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Cost-effectively offering private buffers in SoCs and CMPs
Proceedings of the international conference on Supercomputing
Receive side coalescing for accelerating TCP/IP processing
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Analyzing performance and power efficiency of network processing over 10 GbE
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
A majority of the current and next generation server applications (web services, e-commerce, storage, etc.) employ TCP/IP as the communication protocol of choice. As a result, the performance of these applications is heavily dependent on the efficient TCP/IP packet processing within the termination nodes. This dependency becomes even greater as the bandwidth needs of these applications grow from 100 Mbps to 1Gbps to 10Gbps in the near future. Motivated by this, our work presented in this paper focuses on the following: (a) to understand the performance behavior of the various modes of TCP/IP processing, (b) to analyze the underlying architectural characteristics of TCP/IP packet processing and (c) to quantify the computational requirements of the TCP/IP packet processing component within realistic workloads. We achieve these goals by performing an in-depth analysis of packet processing performance on Intelýs state-of-the-art low power Pentium® M microprocessor running the Microsoft Windows* Server 2003 operating system. Some of our key observations are 驴(i) that the mode of TCP/IP operation can significantly affect the performance requirements, (ii) that transmit-side processing is largely computeintensive as compared to receive-side processing which is more memory-bound and (iii) that the computational requirements for sending/receiving packets can form a substantial component (28% to 40%) of commercial server workloads. From our analysis, we also discuss architectural as well as stack-related improvements that can help achieve higher server network throughput and result in improved application performance.