Efficient use of memory bandwidth to improve network processor throughput

Authors:
Jahangir Hasan;Satish Chandra;T. N. Vijaykumar
Affiliations:
Purdue University;IBM Corporation;Purdue University
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 12
Cited 16

Design and evaluation of a DRAM-based shared memory ATM switch

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scalable high speed IP routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
A 50-Gb/s IP router

IEEE/ACM Transactions on Networking (TON)
IP packet generation: statistical models for TCP start times based on connection-rate superposition

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
The click modular router

ACM Transactions on Computer Systems (TOCS)
Building a robust software-based router using network processors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Using Cohort-Scheduling to Enhance Server Performance

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Issues and trends in router design

IEEE Communications Magazine
Technologies and building blocks for fast packet forwarding

IEEE Communications Magazine

Design and Implementation of High-Performance Memory Systems for Future Packet Buffers

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A heterogeneously segmented cache architecture for a packet forwarding engine

Proceedings of the 19th annual international conference on Supercomputing
Overcoming the memory wall in packet processing: hammers or ladders?

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Design of an efficient memory subsystem for network processor

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Exploiting locality to ameliorate packet queue contention and serialization

Proceedings of the 3rd conference on Computing frontiers
A DRAM/SRAM Memory Scheme for Fast Packet Buffers

IEEE Transactions on Computers
The bit-reversal SDRAM address mapping

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Virtually Pipelined Network Memory

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Conserving network processor power consumption by exploiting traffic variability

ACM Transactions on Architecture and Code Optimization (TACO)
Reconciling performance and programmability in networking systems

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
High-bandwidth network memory system through virtual pipelines

IEEE/ACM Transactions on Networking (TON)
Compiler assisted dynamic management of registers for network processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing packet accesses for a domain specific language on network processors

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Improving latency tolerance of network processors through simultaneous multithreading

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Advanced packet segmentation and buffering algorithms in network processors

Transactions on High-Performance Embedded Architectures and Compilers IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the efficiency of packet buffers used in packet switches built using network processors (NPs). Packet buffers are typically implemented using DRAM, which provides plentiful buffering at a reasonable cost. The problem we address is that a typical NP workload may be unable to utilize the peak DRAM bandwidth. Since the bandwidth of the packet buffer is often the bottleneck in the performance of a shared-memory packet switch, inefficient use of available DRAM bandwidth further reduces the packet throughput. Specialized hardware-based schemes that alleviate the DRAM bandwith problem in high-end routers may be less applicable to NP-based systems, in which cost is an important consideration.In this paper, we propose cost-effective ways to enhance average-case DRAM bandwidth. In modern DRAMs, successive accesses falling within the same DRAM row are significantly faster than those falling across rows. If accesses to DRAM can be generated differently or reordered to take advantage of fast same-row accesses, peak DRAM bandwidth can be approached. The challenge is in exploiting this "row locality" despite the unpredictable nature of memory accesses in NPs. We propose a set of simple techniques to meet this challenge. These include locality-sensitive buffer allocation on packet input, reordering DRAM accesses to increase locality, and prefetching to reduce row miss penalty. We evaluate our techniques on cycle-accurate simulations of Intel's IXP 1200 network processor and find that they boost packet throughput on average by 42.7%, utilizing nearly the peak DRAM bandwidth, for a set of common NP applications processing a real trace.