Comparing high-performance multi-core web-server architectures

Authors:
Ashif S. Harji;Peter A. Buhr;Tim Brecht
Affiliations:
University of Waterloo;University of Waterloo;University of Waterloo
Venue:
Proceedings of the 5th Annual International Systems and Storage Conference
Year:
2012

Citing 21
Cited 3

Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
httperf—a tool for measuring web server performance

ACM SIGMETRICS Performance Evaluation Review
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
High-Performance Memory-Based Web Servers: Kernel and User-Space Performance

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Capriccio: scalable threads for internet services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A multi-threaded PIPELINED Web server architecture for SMP/SoC machines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Evaluating network processing efficiency with processor partitioning and asynchronous I/O

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Expressing and exploiting concurrency in networked applications with aspen

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Building secure high-performance web services with OKWS

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Measuring the capacity of a web server

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Flux: a language for programming high-performance servers

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Open versus closed: a cautionary tale

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Flash: an efficient and portable web server

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Comparing the performance of web server architectures

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Performance scalability of a multi-core web server

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
RouteBricks: exploiting parallelism to scale software routers

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Finding a needle in Haystack: facebook's photo storage

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Watching Video over the Web: Part 1: Streaming Protocols

IEEE Internet Computing
A case for scaling applications to many-core with OS clustering

Proceedings of the sixth conference on Computer systems
Our troubles with Linux and why you should care

Proceedings of the Second Asia-Pacific Workshop on Systems

Improving the scalability of a multi-core web server

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Our troubles with Linux Kernel upgrades and why you should care

ACM SIGOPS Operating Systems Review
Understanding, modelling, and improving the performance of web applications in multicore virtualised environments

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study how web-server architecture and implementation affect performance when trying to obtain high throughput on a 4-core system servicing static content. We focus on static content as a growing numbers of servers are dedicated to workloads comprised of songs, photos, software, and videos chunked for HTTP downloads. Two representative static-content workloads are used: one serviced entirely from the file-system cache and the other requires significant disk I/O. We focus on 4-core systems as: 1) it is a widely used configurations in data-centers and cloud services, 2) recent studies show large SMP systems may operate more efficiently when subdivided into smaller subsystems, 3) understanding performance with a smaller number of cores is essential before scaling to a larger number of cores, 4) and 4-cores may be sufficient for many web servers. Two high-performance web-servers, with event-driven (μserver) and pipelined (WatPipe) architectures, are developed and tested for a multi-core environment. By carefully implementing and tuning the two web-servers, both achieve performance comparable to running independent copies of the server on each processor (N-copy). The new web-servers achieve high throughput (4,000--6,000 Mbps) with 40,000 to 70,000 connects/second; performance in all cases is better than nginx, lighttpd, and Apache. We conclude that implementation and tuning of web servers is perhaps more important than server architecture. We also find it is better to use blocking rather than non-blocking calls to sendfile, when the requested files do not all fit in the file-system cache.