The Sprite Network Operating System
Computer
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Performance from architecture: comparing a RISC and a CISC with similar hardware organization
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Improving instruction cache behavior by reducing cache pollution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The X-Kernel: An Architecture for Implementing Network Protocols
IEEE Transactions on Software Engineering
Using continuations to implement thread management and communication in operating systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A dynamic network architecture
ACM Transactions on Computer Systems (TOCS)
Alpha architecture reference manual
Alpha architecture reference manual
Synthesis: an efficient implementation of fundamental operating system services
Synthesis: an efficient implementation of fundamental operating system services
Limits to low-latency communication on high-speed networks
ACM Transactions on Computer Systems (TOCS)
The importance of non-data touching processing overheads in TCP/IP
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Protocol service decomposition for high-performance networking
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
IEEE Spectrum
USC: a universal stub compiler
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Trace-directed program restructuring for AIX executables
IBM Journal of Research and Development
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Generating efficient protocol code from an abstract specification
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Analysis of Techniques to Improve Protocol Processing Latency
Analysis of Techniques to Improve Protocol Processing Latency
Making Paths Explicit in the Scout Operating System
Making Paths Explicit in the Scout Operating System
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Resource-bounded partial evaluation
PEPM '97 Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
ASHs: application-specific handlers for high-performance messaging
IEEE/ACM Transactions on Networking (TON)
Structuring Communication Software for Quality-of-Service Guarantees
IEEE Transactions on Software Engineering
Efficient user-space protocol implementations with QoS guarantees using real-time upcalls
IEEE/ACM Transactions on Networking (TON)
Fine-grained dynamic instrumentation of commodity operating system kernels
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
A readable TCP in the Prolac protocol language
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Building reliable, high-performance communication systems from components
Proceedings of the seventeenth ACM symposium on Operating systems principles
Queue pair IP: a hybrid architecture for system area networks
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Programming language optimizations for modular router configurations
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Architectural analysis and instruction-set optimization for design of network protocol processors
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Fast Paths in Concurrent Programs
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Using Packet Scheduling to Enhance I-Cache Behavior of Protocol Processing
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Performance analysis of TLS Web servers
ACM Transactions on Computer Systems (TOCS)
Network subsystems reloaded: a high-performance, defensible network subsystem
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Aggressive Error Recovery for TCP over wireless links
Integrated Computer-Aided Engineering
TCP/IP offload engine module supporting binary compatibility for standard socket interfaces
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
This paper describes several techniques designed to improve protocol latency, and reports on their effectiveness when measured on a modern RISC machine employing the DEC Alpha processor. We found that the memory system---which has long been known to dominate network throughput---is also a key factor in protocol latency. As a result, improving instruction cache effectiveness can greatly reduce protocol processing overheads. An important metric in this context is the memory cycles per instructions (mCPI), which is the average number of cycles that an instruction stalls waiting for a memory access to complete. The techniques presented in this paper reduce the mCPI by a factor of 1.35 to 5.8. In analyzing the effectiveness of the techniques, we also present a detailed study of the protocol processing behavior of two protocol stacks---TCP/IP and RPC---on a modern RISC processor.