Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Reconciling performance and programmability in networking systems
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Revisiting the Cache Effect on Multicore Multithreaded Network Processors
DSD '08 Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools
Hi-index | 0.00 |
Traditional network processors (NPs) adopt pull model, where NP cores pull packet data from external memory to local memory, triggered by cache miss or fetch instructions. Due to the long latency of data fetching, hardware multithreading is typically used to reduce the waiting time. Multithreading incurs context switch overhead, leading to inefficiency in payload processing applications. We propose a push model for future NP's architectural design to increase throughput and decrease processing delay. A hardware push unit helps to move the segments of a packet to a core's local memory to reduce hardware thread switching. Theoretical analyses are given to compare the pull and push model's performance. Further, we selected our FPGA based THNPU NP platform for verification. Experimental results indicate that the push model not only improves the system throughput, but also reduces the delay, with only a fraction of logic gate increase.