The packer filter: an efficient mechanism for user-level network code
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Transport protocol processing at GBPS rates
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
The X-Kernel: An Architecture for Implementing Network Protocols
IEEE Transactions on Software Engineering
Implementing network protocols at user level
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Locking effects in multiprocessor implementations of protocols
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Protocol service decomposition for high-performance networking
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Experiences with a high-speed network adaptor: a software perspective
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
A Parallel Approach to OSI Connection-Oriented Protocols
Proceedings of the IFIP WG6.1/WG6.4 Third International Workshop on Protocols for High-Speed Networks III
A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A High-Speed Protocol Parallel Implementation: Design and Analysis
Proceedings of the IFIP TC6/WG6.4 Fourth International Conference on High Performance Networking IV
The performance impact of scheduling for cache affinity in parallel network processing
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Parallelized Network Security Protocols
SNDSS '96 Proceedings of the 1996 Symposium on Network and Distributed System Security (SNDSS '96)
Measuring the performance of parallel message-based process architectures
INFOCOM '95 Proceedings of the Fourteenth Annual Joint Conference of the IEEE Computer and Communication Societies (Vol. 2)-Volume - Volume 2
The effectiveness of affinity-based scheduling in multiprocessor networking
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1
IEEE/ACM Transactions on Networking (TON)
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Performance modeling of multiprocessor implementations of protocols
IEEE/ACM Transactions on Networking (TON)
Locality-aware request distribution in cluster-based network servers
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Signals, timers, and continuations for multithreaded user-level protocols
Software—Practice & Experience - Research Articles
Improving network connection locality on multicore systems
Proceedings of the 7th ACM european conference on Computer Systems
Hi-index | 0.00 |
Over the next several years the performance demands on globally available information servers are expected to increase dramatically. These servers must be capable of sending and receiving data over hundreds or even thousands of simultaneous connections. In this paper, we show that connection-level parallel protocols (where different connections are processed in parallel) running on a shared-memory multiprocessor can deliver high network bandwidth across a large number of connections.We experimentally evaluate connection-level parallel implementations of both TCP/IP and UDP/IP protocol stacks. We focus on three questions in our performance evaluation: how throughput scales with the number of processors, how throughput changes as the number of connections increases, and how fairly the aggregate bandwidth is distributed across connections. We show how several factors impact performance: the number of processors used, the number of threads in the system, the number of connections assigned to each thread, and the type of protocols in the stack (i.e., TCP versus UDP).Our results show that with careful implementation connection-level parallel protocol stacks scale well with the number of processors, and deliver high throughput which is, for the most part, sustained as the number of connections increases. Maximizing the number of threads in the system yields the best overall throughput. However, the best fairness behavior is achieved by matching the number of threads to the number of processors and scheduling connections assigned to threads in a round-robin manner.