Scalable Pattern Matching on Multicore Platform via Dynamic Differentiated Distributed Detection (D⁴)

  • Authors:
  • Kai Zheng;Hongbin Lu;Erich Nahum

  • Affiliations:
  • IBM China Research Lab, Beijing, China;Tsinghua University, Beijing, China;IBM T.J. Watson Research Center, Hawthorne, NY, USA

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2011

Quantified Score

Hi-index 14.98

Visualization

Abstract

Pattern Matching (PM) is a key building block for many emerging network applications. Modern multicore platforms are becoming performance competitive with traditional hardware solutions, which are expensive and hard to adapt to the rapid diversification of Internet applications. However, due to uneven network flow sizes and the need to retain packet order within each flow, traditional parallel processing models using packet flows as the basic unit to partition the workload cannot fully take advantage of multicore platforms' power, exhibiting low CPU utilization and poor scalability with increasing numbers of CPUs or cores. In this paper, we propose a novel parallel inspection model called Dynamic Differentiated Distributed Detection ({\rm D}^{4}). {\rm D}^{4} deploys balanced parallel detection by adding one more dimension on PM workload partition. The pattern set is prepartitioned into several subsets so as to distribute the workload of the hot flows across multiple cores while still maintaining packet order within each flow. We also show theoretically that higher number of subsets leads to higher algorithmic overhead. To achieve optimal throughput for all flow size distributions, {\rm D}^{4} prepartitions the pattern set in several ways for use in different detection modes beforehand, and then, dynamically switches among these modes on-the-fly according to the flow and runtime information it senses. {\rm D}^{4} also allows multiple PM algorithms to work simultaneously on different pattern subsets. According to several heuristics and the algorithms' characteristics, the detection mode selection and subset partitioning algorithms are designed to maximize the CPU/core utilization while avoiding unnecessary overheads. Experiments show that {\rm D}^{4} features high core utilization and low overhead, thus achieving distinct performance gains against traditional load balancing schemes, as shown by experimental results using real-world pattern sets and traffic traces.