IEEE/ACM Transactions on Networking (TON)
Proceedings of the 30th annual international symposium on Computer architecture
Load balancing for parallel forwarding
IEEE/ACM Transactions on Networking (TON)
Sequence-preserving adaptive load balancers
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
An evaluation of network stack parallelization strategies in modern operating systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Architectural Characterization of Processor Affinity in Network Processing
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
End system optimizations for high-speed TCP
IEEE Communications Magazine
Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
A scalable multithreaded L7-filter design for multi-core servers
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
RouteBricks: exploiting parallelism to scale software routers
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
SIP server performance on multicore systems
IBM Journal of Research and Development
VoIP performance on multicore platforms
IBM Journal of Research and Development
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
Reinventing scheduling for multicore systems
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Database engines on multicores, why parallelize when you can distribute?
Proceedings of the sixth conference on Computer systems
Comparison of lock thrashing avoidance methods and its performance implications for lock design
Proceedings of the third international workshop on Large-scale system and application performance
Quarantine: fault tolerance for concurrent servers with data-driven selective isolation
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Task optimization based on CPU pipeline technique in a multicore system
Computers & Mathematics with Applications
Comparing high-performance multi-core web-server architectures
Proceedings of the 5th Annual International Systems and Storage Conference
MegaPipe: a new programming interface for scalable network I/O
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
An efficient parallelized L7-filter design for multicore servers
IEEE/ACM Transactions on Networking (TON)
A lightweight VMM on many core for high performance computing
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Improving the scalability of a multi-core web server
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
On the core affinity and file upload performance of Hadoop
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 0.00 |
Today's large multi-core Internet servers support thousands of concurrent connections or ows. The computation ability of future server platforms will depend on increasing numbers of cores. The key to ensure that performance scales with cores is to ensure that systems software and hardware are designed to fully exploit the parallelism that is inherent in independent network ows. This paper identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform. However, performance scaling on commercial web servers has proven elusive. We determined that on web server running a modified SPEC-web2005 Support workload, throughput scales only 4.8 x on eight cores. Our results show that the operating system, TCP/IP stack, and application exploited ow-level parallelism well with few exceptions, and that load imbalance and shared cache affected performance little. Having eliminated these potential bottlenecks, we determined that performance scaling was limited by the capacity of the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores.