An efficient parallelized L7-filter design for multicore servers

Authors:
Danhua Guo;Laxmi Narayan Bhuyan;Bin Liu
Affiliations:
Microsoft Corporation, Mountain View, CA;Computer Science and Engineering Department, University of California, Riverside, Riverside, CA;Computer Science and Technology Department, Tsinghua University, Beijing, China
Venue:
IEEE/ACM Transactions on Networking (TON)
Year:
2012

Citing 18
Cited 0

Using name-based mappings to increase hit rates

IEEE/ACM Transactions on Networking (TON)
Reprogrammable network packet processing on the field programmable port extender (FPX)

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
A High Throughput String Matching Architecture for Intrusion Detection and Prevention

Proceedings of the 32nd annual international symposium on Computer Architecture
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching

Proceedings of the 33rd annual international symposium on Computer Architecture
Algorithms to accelerate multiple regular expressions matching for deep packet inspection

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Load Balancing in a Cluster-Based Web Server for Multimedia Applications

IEEE Transactions on Parallel and Distributed Systems
Fast and memory-efficient regular expression matching for deep packet inspection

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Performance scalability of a multi-core web server

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Compiling PCRE to FPGA for accelerating SNORT IDS

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Hierarchical Scheduling for Symmetric Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Adaptive load sharing for network processors

IEEE/ACM Transactions on Networking (TON)
Deflating the big bang: fast and scalable deep packet inspection with extended finite automata

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
A scalable multithreaded L7-filter design for multi-core servers

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
MultiLayer processing - an execution model for parallel stateful packet processing

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
NetShield: massive semantics-based vulnerability signature matching for high-speed networks

Proceedings of the ACM SIGCOMM 2010 conference
An adaptive hash-based multilayer scheduler for L7-filter on a highly threaded hierarchical multi-core server

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Packet Trains--Measurements and a New Model for Computer Network Traffic

IEEE Journal on Selected Areas in Communications
Hash routing for collections of shared Web caches

IEEE Network: The Magazine of Global Internetworking

Quantified Score

Hi-index	0.00

Visualization

Abstract

L7-filter is a significant deep packet inspection (DPI) extension to Netfilter in Linux's QoS framework. It classifies network traffic based on information hidden in the packet payload. Although the computationally intensive payload classification can be accelerated with multiple processors, the default OS scheduler is oblivious to both the software characteristics and the underlying multicore architecture. In this paper, we present a parallelized L7-filter algorithm and an efficient scheduler technique for multicore servers. Our multithreaded L7-filter algorithm can process the incoming packets on multiple servers boosting the throughput tremendously. Our scheduling algorithm is based on Highest Random Weight (HRW), which maintains the connection locality for the incoming traffic, but only guarantees load balance at the connection level. We present an Adapted Highest Random Weight (AHRW) algorithm that enhances HRW by applying packet-level load balancing with an additional feedback vector corresponding to the queue length at each processor. We further introduce a Hierarchical AHRW (AHRW-tree) algorithm that considers characteristics of the multicore architecture such as cache and hardware topology by developing a hash tree architecture. The algorithm reduces the scheduling overhead to O(log N) instead of O(N) and produces a better balance between locality and load balancing. Results show that the AHRW-tree scheduler can improve the L7-filter throughput by about 50% on a Sun-Niagara- 2-based server compared to a connection locality-based scheduler. Although extensively tested for L7-filter traces, our technique is applicable to many other packet processing applications, where connection locality and load balancing are important while executing on multiple processors. With these speedups and inherent software flexibility, our design and implementation provide a cost-effective alternative to the traffic monitoring and filtering ASICs.