ACM Transactions on Computer Systems (TOCS)
ACM Computing Surveys (CSUR)
Building a robust software-based router using network processors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Design and Analysis of Distributed Algorithms (Wiley Series on Parallel and Distributed Computing)
Design and Analysis of Distributed Algorithms (Wiley Series on Parallel and Distributed Computing)
Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
On runtime management in multi-core packet processing systems
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Factored operating systems (fos): the case for a scalable operating system for multicores
ACM SIGOPS Operating Systems Review
Design of a network service processing platform for data path customization
Proceedings of the 2nd ACM SIGCOMM workshop on Programmable routers for extensible services of tomorrow
RouteBricks: exploiting parallelism to scale software routers
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
Fair multithreading on packet processors for scalable network virtualization
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
The 48-core SCC Processor: the Programmer's View
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
NetBump: user-extensible active queue management with bumps on the wire
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Hi-index | 0.00 |
With the advent of diversifie network services and programmability deployed in the network infrastructure, the functionality of the data path in network systems has moved from "store-and-forward" toward "store-process-forward." However, the processing performance of many contemporary software routers does not scale with the increasing number of processor cores that are integrated on a chip due to software bottlenecks. To tackle one aspect of this problem, we propose a distributed algorithm that can load-balance packet processing workloads on a modern many-core architecture. The algorithm exploits parallelism and achieves load balancing by distributing processing task across different local regions of the chi. Workload distribution at chip level can be achieved with an O(n log n) time complexity and thus can scale to large configurations