Reconciling performance and programmability in networking systems

Authors:
Jayaram Mudigonda;Harrick M. Vin;Stephen W. Keckler
Affiliations:
University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX
Venue:
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Year:
2007

Citing 21
Cited 4

Locking effects in multiprocessor implementations of protocols

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
A 50-Gb/s IP router

IEEE/ACM Transactions on Networking (TON)
Characterizing processor architectures for programmable network interfaces

Proceedings of the 14th international conference on Supercomputing
Integrating superscalar processor components to implement register caching

ICS '01 Proceedings of the 15th international conference on Supercomputing
New directions in traffic measurement and accounting

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
IXP-1200 Programming

IXP-1200 Programming
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
Efficient use of memory bandwidth to improve network processor throughput

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
DRIBBLE-BACK REGISTERS: A TECHNIQUE FOR LATENCY TOLERANCE IN MULTIPROCESSORS

DRIBBLE-BACK REGISTERS: A TECHNIQUE FOR LATENCY TOLERANCE IN MULTIPROCESSORS
Memory Hierarchy Design for a Multiprocessor Look-up Engine

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Tree bitmap: hardware/software IP lookups with incremental updates

ACM SIGCOMM Computer Communication Review
Use-Based Register Caching with Decoupled Indexing

Proceedings of the 31st annual international symposium on Computer architecture
Managing memory access latency in packet processing

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fast hash table lookup using extended bloom filter: an aid to network processing

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Segmented hash: an efficient hash table implementation for high performance networking subsystems

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Addressing the memory bottleneck in packet processing systems

Addressing the memory bottleneck in packet processing systems
Network Algorithmics,: An Interdisciplinary Approach to Designing Fast Networked Devices (The Morgan Kaufmann Series in Networking)

Network Algorithmics,: An Interdisciplinary Approach to Designing Fast Networked Devices (The Morgan Kaufmann Series in Networking)
Algorithms for packet classification

IEEE Network: The Magazine of Global Internetworking

Can software routers scale?

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
A programmable architecture for scalable and real-time network traffic measurements

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Characterizing user-level network virtualization: performance, overheads and limits

International Journal of Network Management
Improving the throughput and delay performance of network processors by applying push model

Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service

Quantified Score

Hi-index	0.00

Visualization

Abstract

Challenges in addressing the memory bottleneck have made it difficult to design a packet processing platform that simultaneously achieves both ease-of-programming and high performance. Today's commercial processors support two architectural mechanisms - namely, hardware multithreading and caching - to overcome the memory bottleneck. The configurations of these mechanisms (e.g., cache capacity, number of threads per processor core) are fixed at processor-design time. The relative effectiveness of these mechanisms, however, varies significantly with application, traffic, and system characteristics. Thus, programmers often struggle to achieve high performance from a processor that is not well-suited to a particular deployment. To address this challenge, we first make a case for, and then develop a malleable processor architecture that facilitates the dynamic reconfiguration of cache capacity and number of threads to best-suit the needs of each deployment. We then present an algorithm that can determine the optimal thread-cache balance at run-time. The combination of these two allows us to simultaneously achieve the goals of ease-of-programming and high performance. We demonstrate that our processor outperforms a processor similar to Intel's IXP2800 - a state-of-the-art commercial Network Processor - in about 89% of the deployments we consider. Further, in about 30% of the deployments our platform improves the throughput by as much as 300%.