Distributed runtime load-balancing for software routers on homogeneous many-core processors

Authors:
Qiang Wu;Dilip Joy Mampilly;Tilman Wolf
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow
Year:
2010

Citing 15
Cited 1

The click modular router

ACM Transactions on Computer Systems (TOCS)
Process migration

ACM Computing Surveys (CSUR)
Building a robust software-based router using network processors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Flexible Control of Parallelism in a Multiprocessor PC Router

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
NP-Click: A Productive Software Development Approach for Network Processors

IEEE Micro
Design and Analysis of Distributed Algorithms (Wiley Series on Parallel and Distributed Computing)

Design and Analysis of Distributed Algorithms (Wiley Series on Parallel and Distributed Computing)
Can software routers scale?

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
On runtime management in multi-core packet processing systems

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Factored operating systems (fos): the case for a scalable operating system for multicores

ACM SIGOPS Operating Systems Review
Design of a network service processing platform for data path customization

Proceedings of the 2nd ACM SIGCOMM workshop on Programmable routers for extensible services of tomorrow
RouteBricks: exploiting parallelism to scale software routers

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
PacketShader: a GPU-accelerated software router

Proceedings of the ACM SIGCOMM 2010 conference
Fair multithreading on packet processors for scalable network virtualization

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
The 48-core SCC Processor: the Programmer's View

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

NetBump: user-extensible active queue management with bumps on the wire

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advent of diversifie network services and programmability deployed in the network infrastructure, the functionality of the data path in network systems has moved from "store-and-forward" toward "store-process-forward." However, the processing performance of many contemporary software routers does not scale with the increasing number of processor cores that are integrated on a chip due to software bottlenecks. To tackle one aspect of this problem, we propose a distributed algorithm that can load-balance packet processing workloads on a modern many-core architecture. The algorithm exploits parallelism and achieves load balancing by distributing processing task across different local regions of the chi. Workload distribution at chip level can be achieved with an O(n log n) time complexity and thus can scale to large configurations