Speeding up protocols for small messages

Authors:
Trevor Blackwell
Affiliations:
Harvard University, 29 Oxford St, Cambridge, MA
Venue:
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Year:
1996

Citing 14
Cited 12

Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Architectural considerations for a new generation of protocols

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The X-Kernel: An Architecture for Implementing Network Protocols

IEEE Transactions on Software Engineering
Alpha architecture reference manual

Alpha architecture reference manual
On the self-similar nature of Ethernet traffic

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Protocol service decomposition for high-performance networking

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Increasing network throughput by integrating protocol layers

IEEE/ACM Transactions on Networking (TON)
Wide-area traffic: the failure of Poisson modeling

SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Operating system support for high-speed networking

Operating system support for high-speed networking
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The design and implementation of the 4.4BSD operating system

The design and implementation of the 4.4BSD operating system
ATOM: a flexible interface for building high performance program analysis tools

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings

Cache behavior of network protocols

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Structuring Communication Software for Quality-of-Service Guarantees

IEEE Transactions on Software Engineering
Using Cohort Scheduling to Enhance Server Performance (Extended Abstract)

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Queue pair IP: a hybrid architecture for system area networks

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Using Cohort-Scheduling to Enhance Server Performance

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Flexible Control of Parallelism in a Multiprocessor PC Router

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Architectural analysis and instruction-set optimization for design of network protocol processors

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Using Packet Scheduling to Enhance I-Cache Behavior of Protocol Processing

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior

Proceedings of the 5th international symposium on Memory management
Programming language challenges in systems codes: why systems programmers still use C, and what to do about it

Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems
Network subsystems reloaded: a high-performance, defensible network subsystem

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Configuration and extension of embedded processors to optimize IPSec protocol execution

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many techniques have been discovered to improve performance of bulk data transfer protocols which use large messages. This paper describes a technique that improves protocol performance for protocols that use small messages, such as signalling protocols, by reducing memory system penalties. Detailed measurements show that for TCP, most memory system costs are due to poor locality in the protocol code itself, rather than movement of data. We present a new technique, analogous to blocked matrix multiplication, for scheduling layer processing to reduce memory system costs, and analyze its performance in a synthetic environment.