Memory Hierarchy Design for a Multiprocessor Look-up Engine

Authors:
Jean-Loup Baer;Douglas Low;Patrick Crowley;Neal Sidhwaney
Affiliations:
-;-;-;-
Venue:
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Year:
2003

Citing 0
Cited 9

Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A Tree Based Router Search Engine Architecture with Single Port Memories

Proceedings of the 32nd annual international symposium on Computer Architecture
A heterogeneously segmented cache architecture for a packet forwarding engine

Proceedings of the 19th annual international conference on Supercomputing
Overcoming the memory wall in packet processing: hammers or ladders?

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Data trace cache: an application specific cache architecture

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Two-level mapping based cache index selection for packet forwarding engines

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Optimizing software cache performance of packet processing applications

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reconciling performance and programmability in networking systems

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Improving performance of digest caches in network processors

HiPC'08 Proceedings of the 15th international conference on High performance computing

Quantified Score

Hi-index	0.04

Visualization

Abstract

We investigate the implementation of IP look-up for core routers using multiple microengines and a tailored memory hierarchy. The main architectural concerns are limiting thenumber of and contention for memory accesses.Using a level compressed trie as an index, we show the impact of the main parameter, the root branching factor, on the memory capacity and number of memory accesses. Despite the lack of locality, we show how a cache can reduce the required memory capacity and limit the amount of expensive multibanking. Results of simulation experiments using contemporary routing tables show that the architecture scales well, at least up to 16 processors, and that thepresence of a small on-chip cache increases throughput significantly, up to 65% over an architecture with the same number of processors but without a cache, all while reducing the amount of required off-chip memory.