Scalable packet classification using interpreting: a cross-platform multi-core solution

Authors:
Haipeng Cheng;Zheng Chen;Bei Hua;Xinan Tang
Affiliations:
University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China;Intel Compiler Lab, Santa Clara, CA, USA
Venue:
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Year:
2008

Citing 16
Cited 5

Thread partitioning and scheduling based on cost model

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Small forwarding tables for fast routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm?

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Fast and scalable layer four switching

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
High-speed policy-based packet forwarding using efficient multi-dimensional range matching

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Packet classification on multiple fields

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Automatically partitioning threads for multithreaded architectures

Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Scalable packet classification

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Classifying Packets with Hierarchical Intelligent Cuttings

IEEE Micro
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
Packet classification using multidimensional cutting

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Tree bitmap: hardware/software IP lookups with incremental updates

ACM SIGCOMM Computer Communication Review
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
High-performance packet classification algorithm for many-core and multithreaded network processor

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

On RTP filtering for network traffic reduction

Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Practice of parallelizing network applications on multi-core architectures

Proceedings of the 23rd international conference on Supercomputing
Balanced HiCuts: an optimized packet classification algorithm

ICCOMP'09 Proceedings of the WSEAES 13th international conference on Computers
Three different designs for packet classification

WSEAS Transactions on Computers
Hashing round-down prefixes for rapid packet classification

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Packet classification is an enabling technology to support advanced Internet services. It is still a challenge for a software solution to achieve 10Gbps (line-rate) classification speed. This paper presents a classification algorithm that can be efficiently implemented on a multi-core architecture with or without cache. The algorithm embraces the holistic notion of exploiting application characteristics, considering the capabilities of the CPU and the memory hierarchy, and performing appropriate data partitioning. The classification algorithm adopts two stages: searching on a reduction tree and searching on a list of ranges. This decision is made based on a classification heuristic: the size of the range list is limited after the first stage search. Optimizations are then designed to speed up the two-stage execution. To exploit the speed gap (1) between the CPU and external memory; (2) between internal memory (cache) and external memory, an interpreter is used to trade the CPU idle cycles with demanding memory access requirements. By applying the CISC style of instruction encoding to compress the range expressions, it not only significantly reduces the total memory requirement but also makes effective use of the internal memory (cache) bandwidth. We show that compressing data structures is an effective optimization across the multi-core architectures. We implement this algorithm on both Intel IXP2800 network processor and Core 2 Duo X86 architecture, and experiment with the classification benchmark, ClassBench. By incorporating architecture-awareness in algorithm design and taking into account the memory hierarchy, data partitioning, and latency hiding in algorithm implementation, the resulting algorithm shows a good scalability on Intel IXP2800. By effectively using the cache system, the algorithm also runs faster than the previous fastest RFC on the Core 2 Duo architecture.