On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The X-Kernel: An Architecture for Implementing Network Protocols
IEEE Transactions on Software Engineering
Locking effects in multiprocessor implementations of protocols
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Experiences with a high-speed network adaptor: a software perspective
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Memory system performance of programs with intensive heap allocation
ACM Transactions on Computer Systems (TOCS)
Talisman: fast and accurate multicomputer simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A system level perspective on branch architecture performance
Proceedings of the 28th annual international symposium on Microarchitecture
The design and implementation of the 4.4BSD operating system
The design and implementation of the 4.4BSD operating system
Networking support for large scale multiprocessor servers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
IEEE/ACM Transactions on Networking (TON)
Analysis of techniques to improve protocol processing latency
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Speeding up protocols for small messages
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Experiences implementing a high performance TCP in user-space
SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
The Memory-Integrated Network Interface
IEEE Micro
Virtual-Memory-Mapped Network Interfaces
IEEE Micro
A Parallel Approach to OSI Connection-Oriented Protocols
Proceedings of the IFIP WG6.1/WG6.4 Third International Workshop on Protocols for High-Speed Networks III
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Measuring the performance of parallel message-based process architectures
INFOCOM '95 Proceedings of the Fourteenth Annual Joint Conference of the IEEE Computer and Communication Societies (Vol. 2)-Volume - Volume 2
Validating an Architectural Simulator TITLE2:
Validating an Architectural Simulator TITLE2:
An analysis of process and memory models to support high-speed networking in a UNIX environment
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Architectural analysis and instruction-set optimization for design of network protocol processors
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Overcoming the memory wall in packet processing: hammers or ladders?
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
TCP offload through connection handoff
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Network interfaces for programmable NICs and multicore platforms
Computer Networks: The International Journal of Computer and Telecommunications Networking
Analyzing performance and power efficiency of network processing over 10 GbE
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In this paper we present a performance study of memory reference behavior in network protocol processing, using an Internet-based protocol stack implemented in the x-kernel running in user space on a MIPS R4400-based Silicon Graphics machine. We use the protocols to drive a validated execution-driven architectural simulator of our machine. We characterize the behavior of network protocol processing, deriving statistics such as cache miss rates and percentage of time spent waiting for memory. We also determine how sensitive protocol processing is to the architectural environment, varying factors such as cache size and associativity, and predict performance on future machines.We show that network protocol cache behavior varies widely, with miss rates ranging from 0 to 28 percent, depending on the scenario. We find instruction cache behavior has the greatest effect on protocol latency under most cases, and that cold cache behavior is very different from warm cache behavior. We demonstrate the upper bounds on performance that can be expected by improving memory behavior, and the impact of features such as associativity and larger cache sizes. In particular, we find that TCP is more sensitive to cache behavior than UDP, gaining larger benefits from improved associativity and bigger caches. We predict that network protocols will scale well with CPU speeds in the future.