Design and evaluation of a DRAM-based shared memory ATM switch
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scalable high speed IP routing lookups
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
IEEE/ACM Transactions on Networking (TON)
IP packet generation: statistical models for TCP start times based on connection-rate superposition
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the 27th annual international symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
Building a robust software-based router using network processors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Using Cohort-Scheduling to Enhance Server Performance
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Issues and trends in router design
IEEE Communications Magazine
Technologies and building blocks for fast packet forwarding
IEEE Communications Magazine
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A heterogeneously segmented cache architecture for a packet forwarding engine
Proceedings of the 19th annual international conference on Supercomputing
Overcoming the memory wall in packet processing: hammers or ladders?
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Design of an efficient memory subsystem for network processor
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Exploiting locality to ameliorate packet queue contention and serialization
Proceedings of the 3rd conference on Computing frontiers
A DRAM/SRAM Memory Scheme for Fast Packet Buffers
IEEE Transactions on Computers
The bit-reversal SDRAM address mapping
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Virtually Pipelined Network Memory
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Conserving network processor power consumption by exploiting traffic variability
ACM Transactions on Architecture and Code Optimization (TACO)
Reconciling performance and programmability in networking systems
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
High-bandwidth network memory system through virtual pipelines
IEEE/ACM Transactions on Networking (TON)
Compiler assisted dynamic management of registers for network processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing packet accesses for a domain specific language on network processors
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Improving latency tolerance of network processors through simultaneous multithreading
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Advanced packet segmentation and buffering algorithms in network processors
Transactions on High-Performance Embedded Architectures and Compilers IV
Hi-index | 0.00 |
We consider the efficiency of packet buffers used in packet switches built using network processors (NPs). Packet buffers are typically implemented using DRAM, which provides plentiful buffering at a reasonable cost. The problem we address is that a typical NP workload may be unable to utilize the peak DRAM bandwidth. Since the bandwidth of the packet buffer is often the bottleneck in the performance of a shared-memory packet switch, inefficient use of available DRAM bandwidth further reduces the packet throughput. Specialized hardware-based schemes that alleviate the DRAM bandwith problem in high-end routers may be less applicable to NP-based systems, in which cost is an important consideration.In this paper, we propose cost-effective ways to enhance average-case DRAM bandwidth. In modern DRAMs, successive accesses falling within the same DRAM row are significantly faster than those falling across rows. If accesses to DRAM can be generated differently or reordered to take advantage of fast same-row accesses, peak DRAM bandwidth can be approached. The challenge is in exploiting this "row locality" despite the unpredictable nature of memory accesses in NPs. We propose a set of simple techniques to meet this challenge. These include locality-sensitive buffer allocation on packet input, reordering DRAM accesses to increase locality, and prefetching to reduce row miss penalty. We evaluate our techniques on cycle-accurate simulations of Intel's IXP 1200 network processor and find that they boost packet throughput on average by 42.7%, utilizing nearly the peak DRAM bandwidth, for a set of common NP applications processing a real trace.