Optimizing the BSD routing system for parallel processing

Authors:
Qing Li;Kip Macy
Affiliations:
Blue Coat Systems, Inc., Sunnyvale, CA, USA;The FreeBSD Project, Palo Alto, CA, USA
Venue:
Proceedings of the 2nd ACM SIGCOMM workshop on Programmable routers for extensible services of tomorrow
Year:
2009

Citing 4
Cited 0

TCP/IP illustrated (vol. 2): the implementation

TCP/IP illustrated (vol. 2): the implementation
Load Balancing Servers, Firewalls, and Caches

Load Balancing Servers, Firewalls, and Caches
The Design and Implementation of the FreeBSD Operating System

The Design and Implementation of the FreeBSD Operating System
IPv6 Core Protocols Implementation (The Morgan Kaufmann Series in Networking)

IPv6 Core Protocols Implementation (The Morgan Kaufmann Series in Networking)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The routing architecture of the original 4.4BSD [3] kernel has been deployed successfully without major design modification for over 15 years. In the unified routing architecture, layer-3 (L3) IP routes are maintained with layer-2 (L2) ARP entries in the same kernel structures. This meant that a single table lookup can return both results. Today, the prevalence of multi-core CPUs and parallel processor architectures is driving the re-design of software data structures and control flows to fully exploit the parallel capabilities of commodity hardware. A common parallel TCP/IP network protocol stack design separates out L2 and L3 processing from layer-4 (L4) and layer-5 (L5) (TCP and socket) onto different CPU cores. The unified routing architecture creates data dependencies between these layers, complicating the design and causing high levels of lock contention. In this paper we will detail the routing architecture that we have implemented for the upcoming FreeBSD 8.0 kernel, which eliminates the data dependencies and facilitates better parallelization of the network protocol stacks. We will describe the impact of this design on higher layer protocols such as TCP and UDP flow processing, and provide performance comparison between the original and the new design.