Reconciling performance and programmability in networking systems

  • Authors:
  • Jayaram Mudigonda;Harrick M. Vin;Stephen W. Keckler

  • Affiliations:
  • University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX

  • Venue:
  • Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Challenges in addressing the memory bottleneck have made it difficult to design a packet processing platform that simultaneously achieves both ease-of-programming and high performance. Today's commercial processors support two architectural mechanisms - namely, hardware multithreading and caching - to overcome the memory bottleneck. The configurations of these mechanisms (e.g., cache capacity, number of threads per processor core) are fixed at processor-design time. The relative effectiveness of these mechanisms, however, varies significantly with application, traffic, and system characteristics. Thus, programmers often struggle to achieve high performance from a processor that is not well-suited to a particular deployment. To address this challenge, we first make a case for, and then develop a malleable processor architecture that facilitates the dynamic reconfiguration of cache capacity and number of threads to best-suit the needs of each deployment. We then present an algorithm that can determine the optimal thread-cache balance at run-time. The combination of these two allows us to simultaneously achieve the goals of ease-of-programming and high performance. We demonstrate that our processor outperforms a processor similar to Intel's IXP2800 - a state-of-the-art commercial Network Processor - in about 89% of the deployments we consider. Further, in about 30% of the deployments our platform improves the throughput by as much as 300%.