High-performance packet classification algorithm for multithreaded IXP network processor

Authors:
Duo Liu;Zheng Chen;Bei Hua;Nenghai Yu;Xinan Tang
Affiliations:
Southwest University of Science and Technology, University of Science and Technology of China, and Suzhou Institute for Advanced Study of USTC;University of Science and Technology of China, Hefei, P.R. China;University of Science and Technology of China and Suzhou Institute for Advanced Study of USTC, Hefei, P.R. China;University of Science and Technology of China, Hefei, P.R. China;Intel Corporation, Santa Clara, California
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2008

Citing 15
Cited 3

Thread partitioning and scheduling based on cost model

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Small forwarding tables for fast routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm?

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Fast and scalable layer four switching

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
High-speed policy-based packet forwarding using efficient multi-dimensional range matching

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Packet classification on multiple fields

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Automatically partitioning threads for multithreaded architectures

Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Scalable packet classification

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Classifying Packets with Hierarchical Intelligent Cuttings

IEEE Micro
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
Packet classification using multidimensional cutting

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Programming challenges in network processor deployment

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Tree bitmap: hardware/software IP lookups with incremental updates

ACM SIGCOMM Computer Communication Review
IBM PowerNP network processor: Hardware, software, and applications

IBM Journal of Research and Development
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

On RTP filtering for network traffic reduction

Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Practice of parallelizing network applications on multi-core architectures

Proceedings of the 23rd international conference on Supercomputing
Hint-based cache design for reducing miss penalty in HBS packet classification algorithm

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10 Gbps speed or higher is a challenging problem and it is still one of the performance bottlenecks in core routers. In general, classification algorithms face the same challenge of balancing between high classification speed and low memory requirements. This paper proposes a modified recursive flow classification (RFC) algorithm, Bitmap-RFC, which significantly reduces the memory requirements of RFC by applying a bitmap compression technique. To speed up classifying speed, we exploit the multithreaded architectural features in various algorithm development stages from algorithm design to algorithm implementation. As a result, Bitmap-RFC strikes a good balance between speed and space. It can significantly keep both high classification speed and reduce memory space consumption. This paper investigates the main NPU software design aspects that have dramatic performance impacts on any NPU-based implementations: memory space reduction, instruction selection, data allocation, task partitioning, and latency hiding. We experiment with an architecture-aware design principle to guarantee the high performance of the classification algorithm on an NPU implementation. The experimental results show that the Bitmap-RFC algorithm achieves 10 Gbps speed or higher and has a good scalability on Intel IXP2800 NPU.