Fast and scalable layer four switching
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
High-speed policy-based packet forwarding using efficient multi-dimensional range matching
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Packet classification on multiple fields
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Scalable packet classification
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Building a robust software-based router using network processors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
IXP-1200 Programming
Space Decomposition Techniques for Fast Layer-4 Switching
PfHSN '99 Proceedings of the IFIP TC6 WG6.1 & WG6.4 / IEEE ComSoc TC on on Gigabit Networking Sixth International Workshop on Protocols for High Speed Networks VI
Algorithms for packet classification
IEEE Network: The Magazine of Global Internetworking
Hint-based cache design for reducing miss penalty in HBS packet classification algorithm
Journal of Parallel and Distributed Computing
Hi-index | 0.24 |
Multi-field packet classification is frequently performed by network devices such as edge routers and firewalls-such devices can utilize programmable network processors to perform this compute-intensive task at nearly line speeds. The architectures of programmable network processors are typically highly parallel and a single algorithm can be mapped in different ways onto the hardware. In this paper, we study the performance of two different design mappings of the Bit Vector packet classification algorithm on the Intel^(R) IXP1200 network processor. We show that: (i) Overall, the parallel mapping has better packet processing rate (25% more) than the pipelined mapping; (ii) In the parallel mapping, a processing element's utilization can be considerably affected by code complexity, in terms of branching, because of significant time wasted (as much as 40% more) due to aborting instruction execution pipelines; (iii) In the pipelined mapping, multiple memory reads per packet can lower the overall performance.