Eliminating receive livelock in an interrupt-driven kernel
ACM Transactions on Computer Systems (TOCS)
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing the Impact of the MemoryWall for I/O Using Cache Injection
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
Architectural Characterization of Processor Affinity in Network Processing
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Software techniques to improve virtualized I/O performance on multi-core systems
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Efficient Translation of Algorithmic Kernels on Large-Scale Multi-cores
CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 02
MiAMI: Multi-core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces
HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
Reliable communication for datacenters
Reliable communication for datacenters
IsoStack: highly efficient network processing on dedicated cores
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
A new server I/O architecture for high speed networks
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
A Transport-Friendly NIC for Multicore/Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Operating systems abstractions for software packet processing in datacenters
Operating systems abstractions for software packet processing in datacenters
Cache-aware affinitization on commodity multicores for high-speed network flows
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Evaluating perceptual video quality for mobile clients in 802.11n WLAN
Proceedings of the 8th ACM international workshop on Wireless network testbeds, experimental evaluation & characterization
Hi-index | 0.00 |
Multi-core end-systems use Receive Side Scaling (RSS) to parallelize protocol processing. RSS uses a hash function on the standard flow descriptors and an indirection table to assign incoming packets to receive queues which are pinned to specific cores. This ensures flow affinity in that the interrupt processing of all packets belonging to a specific flow is processed by the same core. A key limitation of standard RSS is that it does not consider the application process that consumes the incoming data in determining the flow affinity. In this paper, we carry out a detailed experimental analysis of the performance impact of the application affinity in a 40 Gbps testbed network with a dual hexa-core end-system. We show, contrary to conventional wisdom, that when the application process and the flow are affinitized to the same core, the performance (measured in terms of end-to-end TCP throughput) is significantly lower than the line rate. Near line rate performance is observed when the flow and the application process are affinitized to different cores belonging to the same socket. Furthermore, affinitizing the application and the flow to cores on different sockets results in significantly lower throughput than the line rate. These results arise due to the memory bottleneck, which is demonstrated using preliminary correlational data on the cache hit rate in the core that services the application process.