Networking support for large scale multiprocessor servers

Authors:
David J. Yates;Erich M. Nahum;James F. Kurose;Don Towsley
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA;Department of Computer Science, University of Massachusetts, Amherst, MA;Department of Computer Science, University of Massachusetts, Amherst, MA;Department of Computer Science, University of Massachusetts, Amherst, MA
Venue:
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
1996

Citing 15
Cited 6

The packer filter: an efficient mechanism for user-level network code

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Transport protocol processing at GBPS rates

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
The X-Kernel: An Architecture for Implementing Network Protocols

IEEE Transactions on Software Engineering
Implementing network protocols at user level

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Locking effects in multiprocessor implementations of protocols

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Fbufs: a high-bandwidth cross-domain transfer facility

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Protocol service decomposition for high-performance networking

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Experiences with a high-speed network adaptor: a software perspective

SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
A Parallel Approach to OSI Connection-Oriented Protocols

Proceedings of the IFIP WG6.1/WG6.4 Third International Workshop on Protocols for High-Speed Networks III
A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A High-Speed Protocol Parallel Implementation: Design and Analysis

Proceedings of the IFIP TC6/WG6.4 Fourth International Conference on High Performance Networking IV
The performance impact of scheduling for cache affinity in parallel network processing

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Parallelized Network Security Protocols

SNDSS '96 Proceedings of the 1996 Symposium on Network and Distributed System Security (SNDSS '96)
Measuring the performance of parallel message-based process architectures

INFOCOM '95 Proceedings of the Fourteenth Annual Joint Conference of the IEEE Computer and Communication Societies (Vol. 2)-Volume - Volume 2
The effectiveness of affinity-based scheduling in multiprocessor networking

INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1

The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version)

IEEE/ACM Transactions on Networking (TON)
Cache behavior of network protocols

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Performance modeling of multiprocessor implementations of protocols

IEEE/ACM Transactions on Networking (TON)
Locality-aware request distribution in cluster-based network servers

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Signals, timers, and continuations for multithreaded user-level protocols

Software—Practice & Experience - Research Articles
Improving network connection locality on multicore systems

Proceedings of the 7th ACM european conference on Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the next several years the performance demands on globally available information servers are expected to increase dramatically. These servers must be capable of sending and receiving data over hundreds or even thousands of simultaneous connections. In this paper, we show that connection-level parallel protocols (where different connections are processed in parallel) running on a shared-memory multiprocessor can deliver high network bandwidth across a large number of connections.We experimentally evaluate connection-level parallel implementations of both TCP/IP and UDP/IP protocol stacks. We focus on three questions in our performance evaluation: how throughput scales with the number of processors, how throughput changes as the number of connections increases, and how fairly the aggregate bandwidth is distributed across connections. We show how several factors impact performance: the number of processors used, the number of threads in the system, the number of connections assigned to each thread, and the type of protocols in the stack (i.e., TCP versus UDP).Our results show that with careful implementation connection-level parallel protocol stacks scale well with the number of processors, and deliver high throughput which is, for the most part, sustained as the number of connections increases. Maximizing the number of threads in the system yields the best overall throughput. However, the best fairness behavior is achieved by matching the number of threads to the number of processors and scheduling connections assigned to threads in a round-robin manner.