Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Architectural considerations for a new generation of protocols
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The X-Kernel: An Architecture for Implementing Network Protocols
IEEE Transactions on Software Engineering
Alpha architecture reference manual
Alpha architecture reference manual
On the self-similar nature of Ethernet traffic
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Protocol service decomposition for high-performance networking
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Increasing network throughput by integrating protocol layers
IEEE/ACM Transactions on Networking (TON)
Wide-area traffic: the failure of Poisson modeling
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Operating system support for high-speed networking
Operating system support for high-speed networking
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The design and implementation of the 4.4BSD operating system
The design and implementation of the 4.4BSD operating system
ATOM: a flexible interface for building high performance program analysis tools
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Structuring Communication Software for Quality-of-Service Guarantees
IEEE Transactions on Software Engineering
Using Cohort Scheduling to Enhance Server Performance (Extended Abstract)
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Queue pair IP: a hybrid architecture for system area networks
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Using Cohort-Scheduling to Enhance Server Performance
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Architectural analysis and instruction-set optimization for design of network protocol processors
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Using Packet Scheduling to Enhance I-Cache Behavior of Protocol Processing
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Memory-manager/scheduler co-design: optimizing event-driven servers to improve cache behavior
Proceedings of the 5th international symposium on Memory management
Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems
Network subsystems reloaded: a high-performance, defensible network subsystem
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Configuration and extension of embedded processors to optimize IPSec protocol execution
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
Many techniques have been discovered to improve performance of bulk data transfer protocols which use large messages. This paper describes a technique that improves protocol performance for protocols that use small messages, such as signalling protocols, by reducing memory system penalties. Detailed measurements show that for TCP, most memory system costs are due to poor locality in the protocol code itself, rather than movement of data. We present a new technique, analogous to blocked matrix multiplication, for scheduling layer processing to reduce memory system costs, and analyze its performance in a synthetic environment.