High-performance packet classification algorithm for many-core and multithreaded network processor

Authors:
Duo Liu;Bei Hua;Xianghui Hu;Xinan Tang
Affiliations:
University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China;Intel Compiler Lab, Santa Clara, California
Venue:
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2006

Citing 15
Cited 4

Thread partitioning and scheduling based on cost model

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Small forwarding tables for fast routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm?

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Fast and scalable layer four switching

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
High-speed policy-based packet forwarding using efficient multi-dimensional range matching

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Packet classification on multiple fields

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Automatically partitioning threads for multithreaded architectures

Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Scalable packet classification

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Classifying Packets with Hierarchical Intelligent Cuttings

IEEE Micro
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
Packet classification using multidimensional cutting

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Programming challenges in network processor deployment

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Tree bitmap: hardware/software IP lookups with incremental updates

ACM SIGCOMM Computer Communication Review
IBM PowerNP network processor: Hardware, software, and applications

IBM Journal of Research and Development
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Towards high-performance flow-level packet processing on multi-core network processors

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Scalable packet classification using interpreting: a cross-platform multi-core solution

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
SIP server performance on multicore systems

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10Gbps speed or higher is a challenging problem and it is still one of the performance bottlenecks in core routers. In general, classification algorithms face the same challenge of balancing between high classification speed and low memory requirements. This paper proposes a modified Recursive Flow Classification (RFC) algorithm, Bitmap-RFC, which significantly reduces the memory requirements of RFC by applying a bitmap compression technique. To speed up classifying speed, we experiment on exploiting the architectural features of a many-core and multithreaded architecture from algorithm design to algorithm implementation. As a result, Bitmap-RFC strikes a good balance between speed and space. It can not only keep high classification speed but also reduce memory space significantly.This paper investigates the main NPU software design aspects that have dramatic performance impacts on any NPU-based implementations: memory space reduction, instruction selection, data allocation, task partitioning, and latency hiding. We experiment with an architecture-aware design principle to guarantee the high performance of the classification algorithm on an NPU implementation. The experimental results show that the Bitmap-RFC algorithm achieves 10Gbps speed or higher and has a good scalability on Intel IXP2800 NP.