Practice of parallelizing network applications on multi-core architectures

  • Authors:
  • Junchang Wang;Haipeng Cheng;Bei Hua;Xinan Tang

  • Affiliations:
  • University of Science and Technology of China, Hefei, Anhui, China;University of Science and Technology of China, Hefei, Anhui, China;University of Science and Technology of China, Hefei, Anhui, China;Intel Compiler Lab, Santa Clara, California, USA

  • Venue:
  • Proceedings of the 23rd international conference on Supercomputing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The industry wide shift to multi-core architectures arouses great interests in parallelizing sequential applications. However, it is very difficult to parallelize fine-grained applications for multi-core architectures due to insufficient hardware support of fast communication and synchronization. Fortunately, network applications can be decomposed into pipelined structures that are amenable to streaming based parallel processing. To realize the potential of pipelining on multi-core architectures, it requires reevaluating the basic tradeoffs in parallel processing, including the ones between load balance and data locality and between general lock mechanisms and special lock-free data structures. This paper presents the practice of building a high-performance multi-core based network processing platform in which connection-affinity and lock-free design principles are applied effectively for better data locality and faster core-to-core synchronization and communication. We parallelize a complete Layer 2 to Layer 7 (L2-L7) network processing system on an Intel Core 2 Quad processor, including a TCP/IP stack based on Libnids (L2-L4) and a port-independent protocol identification engine by deep packet inspection (L7+). Furthermore, we develop a compiling method to transform sequential network applications to parallel ones to enable those applications to run on multi-core architectures. Our experience suggests that (1) fine-grained pipelining can be a good software solution for parallelizing network applications on multi-core architectures if connection-affinity and lock-free are used as the first design principles; (2) a delicate partitioning scheme is required to map pipelined structures onto specific multi-core architecture; (3) an automatic parallelization approach can work if domain knowledge is considered in the parallelizing process. Our multi-core based network processing platform can deliver not only 6Gbps processing speed for large packet sizes but also more challenging 2Gbps speed for smaller packets.