Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A Tree Based Router Search Engine Architecture with Single Port Memories
Proceedings of the 32nd annual international symposium on Computer Architecture
A heterogeneously segmented cache architecture for a packet forwarding engine
Proceedings of the 19th annual international conference on Supercomputing
Overcoming the memory wall in packet processing: hammers or ladders?
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Data trace cache: an application specific cache architecture
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Two-level mapping based cache index selection for packet forwarding engines
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Optimizing software cache performance of packet processing applications
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reconciling performance and programmability in networking systems
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Improving performance of digest caches in network processors
HiPC'08 Proceedings of the 15th international conference on High performance computing
Hi-index | 0.04 |
We investigate the implementation of IP look-up for core routers using multiple microengines and a tailored memory hierarchy. The main architectural concerns are limiting thenumber of and contention for memory accesses.Using a level compressed trie as an index, we show the impact of the main parameter, the root branching factor, on the memory capacity and number of memory accesses. Despite the lack of locality, we show how a cache can reduce the required memory capacity and limit the amount of expensive multibanking. Results of simulation experiments using contemporary routing tables show that the architecture scales well, at least up to 16 processors, and that thepresence of a small on-chip cache increases throughput significantly, up to 65% over an architecture with the same number of processors but without a cache, all while reducing the amount of required off-chip memory.