Improving latency tolerance of network processors through simultaneous multithreading

Authors:
Bo Liang;Hong An;Fang Lu;Rui Guo
Affiliations:
Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Venue:
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Year:
2005

Citing 7
Cited 0

Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
Efficient use of memory bandwidth to improve network processor throughput

Proceedings of the 30th annual international symposium on Computer architecture
Branch Prediction and Simultaneous Multithreading

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
NpBench: A Benchmark Suite for Control plane and Data plane Applications for Network Processors

ICCD '03 Proceedings of the 21st International Conference on Computer Design
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing multithreaded network processors architecture with multiple processing engines (PEs), aims at taking advantage of blocked multithreading technique which executes instructions of different user-defined threads in the same PE pipeline, in explicit and interleave way. Multiple PEs, each of which is a multithreaded processor core, process several packets in parallel to hide long memory access latency. Most of them are optimized for throughputs mostly in data-plane. In future network workloads, the boundaries between data-plane and control-plane become blurred, so that PEs are demanded not only wire speed packet forwarding on data-plane, but also highly intelligent and increased complex packet processing function on control-plane. In this paper, we analyze SMT’s short latency tolerance potential when used in out-of-order and dynamic scheduling PE cores. We show in this paper that 2~4 issue SMT provides an excellent short memory and branch latency tolerance, which gain higher instructions throughout as well as much simpler structures.