Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows

  • Authors:
  • Nathan Hanford;Vishal Ahuja;Mehmet Balman;Matthew K. Farrens;Dipak Ghosal;Eric Pouyoul;Brian Tierney

  • Affiliations:
  • University of California, Davis, CA;University of California, Davis, CA;Lawrence Berkeley Laboratory, Berkeley, CA;University of California, Davis, CA;University of California, Davis, CA;Lawrence Berkeley Laboratory, Berkeley, CA;Lawrence Berkeley Laboratory, Berkeley, CA

  • Venue:
  • NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-core end-systems use Receive Side Scaling (RSS) to parallelize protocol processing. RSS uses a hash function on the standard flow descriptors and an indirection table to assign incoming packets to receive queues which are pinned to specific cores. This ensures flow affinity in that the interrupt processing of all packets belonging to a specific flow is processed by the same core. A key limitation of standard RSS is that it does not consider the application process that consumes the incoming data in determining the flow affinity. In this paper, we carry out a detailed experimental analysis of the performance impact of the application affinity in a 40 Gbps testbed network with a dual hexa-core end-system. We show, contrary to conventional wisdom, that when the application process and the flow are affinitized to the same core, the performance (measured in terms of end-to-end TCP throughput) is significantly lower than the line rate. Near line rate performance is observed when the flow and the application process are affinitized to different cores belonging to the same socket. Furthermore, affinitizing the application and the flow to cores on different sockets results in significantly lower throughput than the line rate. These results arise due to the memory bottleneck, which is demonstrated using preliminary correlational data on the cache hit rate in the core that services the application process.