Scalable Pattern Matching on Multicore Platform via Dynamic Differentiated Distributed Detection (D⁴)

Authors:
Kai Zheng;Hongbin Lu;Erich Nahum
Affiliations:
IBM China Research Lab, Beijing, China;Tsinghua University, Beijing, China;IBM T.J. Watson Research Center, Hawthorne, NY, USA
Venue:
IEEE Transactions on Computers
Year:
2011

Citing 0
Cited 1

MCA2: multi-core architecture for mitigating complexity attacks

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems

Quantified Score

Hi-index	14.98

Visualization

Abstract

Pattern Matching (PM) is a key building block for many emerging network applications. Modern multicore platforms are becoming performance competitive with traditional hardware solutions, which are expensive and hard to adapt to the rapid diversification of Internet applications. However, due to uneven network flow sizes and the need to retain packet order within each flow, traditional parallel processing models using packet flows as the basic unit to partition the workload cannot fully take advantage of multicore platforms' power, exhibiting low CPU utilization and poor scalability with increasing numbers of CPUs or cores. In this paper, we propose a novel parallel inspection model called Dynamic Differentiated Distributed Detection ({\rm D}^{4}). {\rm D}^{4} deploys balanced parallel detection by adding one more dimension on PM workload partition. The pattern set is prepartitioned into several subsets so as to distribute the workload of the hot flows across multiple cores while still maintaining packet order within each flow. We also show theoretically that higher number of subsets leads to higher algorithmic overhead. To achieve optimal throughput for all flow size distributions, {\rm D}^{4} prepartitions the pattern set in several ways for use in different detection modes beforehand, and then, dynamically switches among these modes on-the-fly according to the flow and runtime information it senses. {\rm D}^{4} also allows multiple PM algorithms to work simultaneously on different pattern subsets. According to several heuristics and the algorithms' characteristics, the detection mode selection and subset partitioning algorithms are designed to maximize the CPU/core utilization while avoiding unnecessary overheads. Experiments show that {\rm D}^{4} features high core utilization and low overhead, thus achieving distinct performance gains against traditional load balancing schemes, as shown by experimental results using real-world pattern sets and traffic traces.