PacketShader: a GPU-accelerated software router

Authors:
Sangjin Han;Keon Jang;KyoungSoo Park;Sue Moon
Affiliations:
KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea
Venue:
Proceedings of the ACM SIGCOMM 2010 conference
Year:
2010

Citing 27
Cited 86

Scalable high speed IP routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Eliminating receive livelock in an interrupt-driven kernel

ACM Transactions on Computer Systems (TOCS)
The click modular router

ACM Transactions on Computer Systems (TOCS)
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
Universal schemes for parallel communication

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
System capability effects on algorithms for network bandwidth measurement

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
NP-Click: A Productive Software Development Approach for Network Processors

IEEE Micro
Evaluating network processing efficiency with processor partitioning and asynchronous I/O

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
The slab allocator: an object-caching kernel memory allocator

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Beyond softnet

ALS '01 Proceedings of the 5th annual Linux Showcase & Conference - Volume 5
Supercharging planetlab: a high performance, multi-application, overlay network platform

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Performance scalability of a multi-core web server

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
OpenFlow: enabling innovation in campus networks

ACM SIGCOMM Computer Communication Review
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Pc-based software routers: high performance and application service support

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
A closer look at GPUs

Communications of the ACM
Exploiting the Power of GPUs for Asymmetric Cryptography

CHES '08 Proceeding sof the 10th international workshop on Cryptographic Hardware and Embedded Systems
Gnort: High Performance Network Intrusion Detection Using Graphics Processors

RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Implementing an OpenFlow switch on the NetFPGA platform

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Practical symmetric key cryptography on modern graphics hardware

SS'08 Proceedings of the 17th conference on Security symposium
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
PdP: parallelizing data plane in virtual network substrate

Proceedings of the 1st ACM workshop on Virtualized infrastructure systems and architectures
RouteBricks: exploiting parallelism to scale software routers

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
IP routing processing with graphic processors

Proceedings of the Conference on Design, Automation and Test in Europe

CRAFT: a new secure congestion control architecture

Proceedings of the 17th ACM conference on Computer and communications security
CloudPolice: taking access control out of the network

Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
SideCar: building programmable datacenter networks without programmable switches

Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Distributed runtime load-balancing for software routers on homogeneous many-core processors

Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow
Controlling parallelism in a multicore software router

Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow
Forwarding path architectures for multicore software routers

Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow
Evaluating the suitability of server network cards for software routers

Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow
A cost comparison of datacenter network architectures

Proceedings of the 6th International COnference
Building extensible networks with rule-based forwarding

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
RouteBricks: enabling general purpose network infrastructure

ACM SIGOPS Operating Systems Review
SSLShader: cheap SSL acceleration with commodity processors

Proceedings of the 8th USENIX conference on Networked systems design and implementation
ServerSwitch: a programmable and high performance platform for data center networks

Proceedings of the 8th USENIX conference on Networked systems design and implementation
The case for VOS: the vector operating system

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Performance comparison of hardware virtualization platforms

NETWORKING'11 Proceedings of the 10th international IFIP TC 6 conference on Networking - Volume Part I
netmap: memory mapped access to network devices

Proceedings of the ACM SIGCOMM 2011 conference
Hermes: an integrated CPU/GPU microarchitecture for IP routing

Proceedings of the 48th Design Automation Conference
In the network: sandy bridge versus nehalem

ACM SIGMETRICS Performance Evaluation Review - Special Issue on IFIP PERFORMANCE 2011- 29th International Symposium on Computer Performance, Modeling, Measurement and Evaluation
Small-world datacenters

Proceedings of the 2nd ACM Symposium on Cloud Computing
Small cache, big effect: provable load balancing for randomly partitioned cluster services

Proceedings of the 2nd ACM Symposium on Cloud Computing
Forty data communications research questions

ACM SIGCOMM Computer Communication Review
PTask: operating system abstractions to manage GPUs as compute devices

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
GPU-assisted AES encryption using GCM

CMS'11 Proceedings of the 12th IFIP TC 6/TC 11 international conference on Communications and multimedia security
MIDeA: a multi-parallel intrusion detection architecture

Proceedings of the 18th ACM conference on Computer and communications security
The middlebox manifesto: enabling innovation in middlebox deployment

Proceedings of the 10th ACM Workshop on Hot Topics in Networks
In-network processing of the GPU-based real-time DXT compression

Proceedings of The ACM CoNEXT Student Workshop
Leveraging Zipf's law for traffic offloading

ACM SIGCOMM Computer Communication Review
GHOST: GPGPU-offloaded high performance storage I/O deduplication for primary storage system

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Compiling high throughput network processors

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Shredder: GPU-accelerated incremental storage and computation

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
NaaS: network-as-a-service in the cloud

Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Toward predictable performance in software packet-processing platforms

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
XIA: efficient support for evolvable internetworking

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Design and implementation of a consolidated middlebox architecture

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Flexible high performance traffic generation on commodity multi---core platforms

TMA'12 Proceedings of the 4th international conference on Traffic Monitoring and Analysis
On multi---gigabit packet capturing with multi---core commodity hardware

PAM'12 Proceedings of the 13th international conference on Passive and Active Measurement
Building a flexible and scalable virtual hardware data plane

IFIP'12 Proceedings of the 11th international IFIP TC 6 conference on Networking - Volume Part I
Caesar: a content router for high speed forwarding

Proceedings of the second edition of the ICN workshop on Information-centric networking
Building a power-proportional software router

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Netmap: a novel framework for fast packet I/O

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Gdev: first-class GPU resource management in the operating system

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
The power of batching in the Click modular router

Proceedings of the Asia-Pacific Workshop on Systems
GPUstore: harnessing GPU computing for storage systems in the OS kernel

Proceedings of the 5th Annual International Systems and Storage Conference
Multi-level Parallelism for Time- and Cost-Efficient Parallel Discrete Event Simulation on GPUs

PADS '12 Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation
DXR: towards a billion routing lookups per second in software

ACM SIGCOMM Computer Communication Review
Kargus: a highly-scalable software-based intrusion detection system

Proceedings of the 2012 ACM conference on Computer and communications security
The power of batching in the click modular router

APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Using vector interfaces to deliver millions of IOPS from a networked key-value storage server

Proceedings of the Third ACM Symposium on Cloud Computing
Generalized resource allocation for the cloud

Proceedings of the Third ACM Symposium on Cloud Computing
NetSlices: scalable multi-core packet processing in user-space

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Wire-speed statistical classification of network traffic on commodity hardware

Proceedings of the 2012 ACM conference on Internet measurement conference
Bridging the gap between applications and networks in data centers

ACM SIGOPS Operating Systems Review
Revisiting flow-based load balancing: Stateless path selection in data center networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
Comparison of caching strategies in modern cellular backhaul networks

Proceeding of the 11th annual international conference on Mobile systems, applications, and services
Wire speed name lookup: a GPU-based approach

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
SoNIC: precise realtime software access and control of wired networks

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
GPU acceleration of regular expression matching for large datasets: exploring the implementation space

Proceedings of the ACM International Conference on Computing Frontiers
Compressing IP forwarding tables: towards entropy bounds and beyond

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Supporting application-specific in-network processing in data centres

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Indexing million of packets per second using GPUs

Proceedings of the 2013 conference on Internet measurement conference
ZMap: fast internet-wide scanning and its security applications

SEC'13 Proceedings of the 22nd USENIX conference on Security
Scalable, high performance ethernet forwarding with CuckooSwitch

Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
Improving server application performance via pure TCP ACK receive optimization

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Toward a verifiable software dataplane

Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks
No silver bullet: extending SDN to the data plane

Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks
Verifiable network function outsourcing: requirements, challenges, and roadmap

Proceedings of the 2013 workshop on Hot topics in middleboxes and network function virtualization
GAMT: a fast and scalable IP lookup engine for GPU-based software routers

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Fast and flexible: parallel packet processing with GPUs and click

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Speeding up packet I/O in virtual machines

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
SWSL: software synthesis for network lookup

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Named data networking on a router: fast and dos-resistant forwarding with hash tables

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Rhythm: harnessing data parallel hardware for server workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPU-accelerated name lookup with component encoding

Computer Networks: The International Journal of Computer and Telecommunications Networking
GPUfs: Integrating a file system with GPUs

ACM Transactions on Computer Systems (TOCS)
High-Performance network traffic processing systems using commodity hardware

DataTraffic Monitoring and Analysis
The Road to SDN

Queue - Large-Scale Implementations
Optimizing LZSS compression on GPGPUs

Future Generation Computer Systems
A grand spread estimator using a graphics processing unit

Journal of Parallel and Distributed Computing
A memory-efficient parallel routing lookup model with fast updates

Computer Communications
Green Networking With Packet Processing Engines: Modeling and Optimization

IEEE/ACM Transactions on Networking (TON)
Software dataplane verification

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
PHY covert channels: can you see the idles?

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
MICA: a holistic approach to fast in-memory key-value storage

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
NetVM: high performance and flexible networking using virtualization on commodity platforms

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
ClickOS and the art of network function virtualization

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
mTCP: a highly scalable user-level TCP stack for multicore systems

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding 64B IPv4 packets at 39 Gbps on a single commodity PC. We have implemented IPv4 and IPv6 forwarding, OpenFlow switching, and IPsec tunneling to demonstrate the flexibility and performance advantage of PacketShader. The evaluation results show that GPU brings significantly higher throughput over the CPU-only implementation, confirming the effectiveness of GPU for computation and memory-intensive operations in packet processing.