Overcoming the memory wall in packet processing: hammers or ladders?

  • Authors:
  • Jayaram Mudigonda;Harrick M. Vin;Raj Yavatkar

  • Affiliations:
  • University of Texas at Austin;University of Texas at Austin;Intel Corporation

  • Venue:
  • Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Overhead of memory accesses limits the performance of packet processing applications. To overcome this bottleneck, today's network processors can utilize a wide-range of mechanisms-such as multi-level memory hierarchy, wide-word accesses, special-purpose result-caches, asynchronous memory, and hardware multi-threading. However, supporting all of these mechanisms complicates programmability and hardware design, and wastes systemresources. In this paper, we address the following fundamental question: what minimal set of hardware mechanisms must a network processor support to achieve the twin goals of simplified programmability and high packet throughput? We show that no single mechanism sufficies; the minimal set must include data-caches and multi-threading. Data-caches and multi-threading are complementary; whereas data-caches exploit locality to reduce the number of context-switches and the off-chip memory bandwidth requirement, multi-threading exploits parallelism to hide long cache-miss latencies.