Improving the throughput and delay performance of network processors by applying push model

Authors:
Bin Liu;Bo Yuan;Huichen Dai;Hongbo Zhao;Jia Yu;Laxmi Bhuyan
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;University of California, Riverside;University of California, Riverside
Venue:
Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service
Year:
2012

Citing 4
Cited 0

Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Direct Cache Access for High Bandwidth Network I/O

Proceedings of the 32nd annual international symposium on Computer Architecture
Reconciling performance and programmability in networking systems

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Revisiting the Cache Effect on Multicore Multithreaded Network Processors

DSD '08 Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional network processors (NPs) adopt pull model, where NP cores pull packet data from external memory to local memory, triggered by cache miss or fetch instructions. Due to the long latency of data fetching, hardware multithreading is typically used to reduce the waiting time. Multithreading incurs context switch overhead, leading to inefficiency in payload processing applications. We propose a push model for future NP's architectural design to increase throughput and decrease processing delay. A hardware push unit helps to move the segments of a packet to a core's local memory to reduce hardware thread switching. Theoretical analyses are given to compare the pull and push model's performance. Further, we selected our FPGA based THNPU NP platform for verification. Experimental results indicate that the push model not only improves the system throughput, but also reduces the delay, with only a fraction of logic gate increase.