Performance scalability of a multi-core web server

Authors:
Bryan Veal;Annie Foong
Affiliations:
Intel Corporation, Hillsboro, OR;Intel Corporation, Hillsboro, OR
Venue:
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Year:
2007

Citing 8
Cited 21

The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version)

IEEE/ACM Transactions on Networking (TON)
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Characterization and Evaluation of Cache Hierarchies for Web Servers

World Wide Web
Load balancing for parallel forwarding

IEEE/ACM Transactions on Networking (TON)
Sequence-preserving adaptive load balancers

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
An evaluation of network stack parallelization strategies in modern operating systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Architectural Characterization of Processor Affinity in Network Processing

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
End system optimizations for high-speed TCP

IEEE Communications Magazine

Can software routers scale?

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
A scalable multithreaded L7-filter design for multi-core servers

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
RouteBricks: exploiting parallelism to scale software routers

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
SIP server performance on multicore systems

IBM Journal of Research and Development
VoIP performance on multicore platforms

IBM Journal of Research and Development
PacketShader: a GPU-accelerated software router

Proceedings of the ACM SIGCOMM 2010 conference
Reinventing scheduling for multicore systems

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An adaptive hash-based multilayer scheduler for L7-filter on a highly threaded hierarchical multi-core server

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Database engines on multicores, why parallelize when you can distribute?

Proceedings of the sixth conference on Computer systems
Comparison of lock thrashing avoidance methods and its performance implications for lock design

Proceedings of the third international workshop on Large-scale system and application performance
Quarantine: fault tolerance for concurrent servers with data-driven selective isolation

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Task optimization based on CPU pipeline technique in a multicore system

Computers & Mathematics with Applications
Comparing high-performance multi-core web-server architectures

Proceedings of the 5th Annual International Systems and Storage Conference
MegaPipe: a new programming interface for scalable network I/O

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
An efficient parallelized L7-filter design for multicore servers

IEEE/ACM Transactions on Networking (TON)
A lightweight VMM on many core for high performance computing

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Improving the scalability of a multi-core web server

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
On the core affinity and file upload performance of Hadoop

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Understanding, modelling, and improving the performance of web applications in multicore virtualised environments

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's large multi-core Internet servers support thousands of concurrent connections or ows. The computation ability of future server platforms will depend on increasing numbers of cores. The key to ensure that performance scales with cores is to ensure that systems software and hardware are designed to fully exploit the parallelism that is inherent in independent network ows. This paper identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform. However, performance scaling on commercial web servers has proven elusive. We determined that on web server running a modified SPEC-web2005 Support workload, throughput scales only 4.8 x on eight cores. Our results show that the operating system, TCP/IP stack, and application exploited ow-level parallelism well with few exceptions, and that load imbalance and shared cache affected performance little. Having eliminated these potential bottlenecks, we determined that performance scaling was limited by the capacity of the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores.