Optimizing the BSD routing system for parallel processing

  • Authors:
  • Qing Li;Kip Macy

  • Affiliations:
  • Blue Coat Systems, Inc., Sunnyvale, CA, USA;The FreeBSD Project, Palo Alto, CA, USA

  • Venue:
  • Proceedings of the 2nd ACM SIGCOMM workshop on Programmable routers for extensible services of tomorrow
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The routing architecture of the original 4.4BSD [3] kernel has been deployed successfully without major design modification for over 15 years. In the unified routing architecture, layer-3 (L3) IP routes are maintained with layer-2 (L2) ARP entries in the same kernel structures. This meant that a single table lookup can return both results. Today, the prevalence of multi-core CPUs and parallel processor architectures is driving the re-design of software data structures and control flows to fully exploit the parallel capabilities of commodity hardware. A common parallel TCP/IP network protocol stack design separates out L2 and L3 processing from layer-4 (L4) and layer-5 (L5) (TCP and socket) onto different CPU cores. The unified routing architecture creates data dependencies between these layers, complicating the design and causing high levels of lock contention. In this paper we will detail the routing architecture that we have implemented for the upcoming FreeBSD 8.0 kernel, which eliminates the data dependencies and facilitates better parallelization of the network protocol stacks. We will describe the impact of this design on higher layer protocols such as TCP and UDP flow processing, and provide performance comparison between the original and the new design.