The importance of non-data touching processing overheads in TCP/IP
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Profiling and reducing processing overheads in TCP/IP
IEEE/ACM Transactions on Networking (TON)
Eliminating receive livelock in an interrupt-driven kernel
ACM Transactions on Computer Systems (TOCS)
Functional divisions in the Piglet multiprocessor operating system
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Win32 Systems Programming
The Virtual Interface Architecture
IEEE Micro
The APIC Approach to High Performance Network Interface Design: Protected DMA and Other Techniques
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
UNIX Network Programming, Vol. 1
UNIX Network Programming, Vol. 1
On the elusive benefits of protocol offload
NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
An Efficient Zero-Copy I/O Framework for UNIX
An Efficient Zero-Copy I/O Framework for UNIX
Efficient Direct User Level Sockets for an Intel® Xeon" Processor Based TCP On-Load Engine
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Storage Over IP: When Does Hardware Support Help?
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Performance Analysis of System Overheads in TCP/IP Workloads
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Server network scalability and TCP offload
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Acceptable strategies for improving web server performance
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Lazy asynchronous I/O for event-driven servers
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TCP offload is a dumb idea whose time has come
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
ALS '01 Proceedings of the 5th annual Linux Showcase & Conference - Volume 5
Flash: an efficient and portable web server
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
A scalable and explicit event delivery mechanism for UNIX
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Trapeze/IP: TCP/IP at near-gigabit speeds
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
High performance and scalable I/O virtualization via self-virtualized devices
Proceedings of the 16th international symposium on High performance distributed computing
Connection handoff policies for TCP offload network interfaces
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualization
Proceedings of the 23rd international conference on Supercomputing
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
Comparing high-performance multi-core web-server architectures
Proceedings of the 5th Annual International Systems and Storage Conference
Hi-index | 0.00 |
Applications requiring high-speed TCP/IP processing can easily saturate a modern server. We and others have previously suggested alleviating this problem in multiprocessor environments by dedicating a subset of the processors to perform network packet processing. The remaining processors perform only application computation, thus eliminating contention between these functions for processor resources. Applications interact with packet processing engines (PPEs) using an asynchronous I/O (AIO) programming interface which bypasses the operating system. A key attraction of this overall approach is that it exploits the architectural trend toward greater thread-level parallelism in future systems based on multi-core processors. In this paper, we conduct a detailed experimental performance analysis comparing this approach to a best-practice configured Linux baseline system.We have built a prototype system implementing this architecture, ETA+AIO (Embedded Transport Acceleration with Asynchronous I/O), and ported a high-performance web-server to the AIO interface. Although the prototype uses modern single-core CPUs instead of future multi-core CPUs, an analysis of its performance can reveal important properties of this approach. Our experiments show that the ETA+AIO prototype has a modest advantage over the baseline Linux system in packet processing efficiency, consuming fewer CPU cycles to sustain the same throughput. This efficiency advantage enables the ETA+AIO prototype to achieve higher peak throughput than the baseline system, but only for workloads where the mix of packet processing and application processing approximately matches the allocation of CPUs in the ETA+AIO system thereby enabling high utilization of all the CPUs. Detailed analysis shows that the efficiency advantage of the ETA+AIO prototype, which uses one PPE CPU, comes from avoiding multiprocessing overheads in packet processing, lower overhead of our AIO interface compared to standard sockets, and reduced cache misses due to processor partitioning.