Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Architectural considerations for a new generation of protocols
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The X-Kernel: An Architecture for Implementing Network Protocols
IEEE Transactions on Software Engineering
UNIX System V network programming
UNIX System V network programming
Implementing network protocols at user level
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
The magic garden explained: the internals of UNIX System V Release 4: an open systems design
The magic garden explained: the internals of UNIX System V Release 4: an open systems design
UNIX systems for modern architectures: symmetric multiprocessing and caching for kernel programmers
UNIX systems for modern architectures: symmetric multiprocessing and caching for kernel programmers
Experiences with a high-speed network adaptor: a software perspective
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
User-space protocols deliver high performance to applications on a low-cost Gb/s LAN
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
SVR4UNIX Scheduler Unacceptable for Multimedia Applications
NOSSDAV '93 Proceedings of the 4th International Workshop on Network and Operating System Support for Digital Audio and Video
The CHARISMA ATM Horst Interface
Proceedings of the 3rd Intermational Conference on Broadband Islands: Connecting with the End-User
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving End System Performance for Multimedia Applicationsover High Bandwidth Networks
Multimedia Tools and Applications
Copy Emulation in Checksummed, Multiple-Packet Communication
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
A Regional Broadcast-Centric Education System
ISCC '97 Proceedings of the 2nd IEEE Symposium on Computers and Communications (ISCC '97)
Hi-index | 0.00 |
In order to reap the benefits of high-speed networks, the performance of the host operating system must at least match that of the underlying network. A barrier to achieving high throughput is the cost of copying data within current host architectures. We present a performance comparison of three styles of network device driver designed for a conventional monolithic UNIX kernel. Each driver performs a different number of copies. The zero-copy driver works by allowing the memory on the network adapter to be mapped directly into user address space. This maximises performance at the cost of: 1) breaking the semantics of existing network APIs such as BSD sockets and SVR4 TLI; 2) pushing responsibility for network buffer management up from the kernel into the application layer. The single-copy driver works by copying data directly between user space and adapter memory obviating the need for an intermediate copy into kernel buffers in main memory. This approach can be made transparent to existing application code but, like the zero-copy case, relies on an adapter with a generous quantity of on-board memory for buffering network data. The two-copy driver is a conventional STREAMS driver. The two-copy approach sacrifices performance for generality. We observe that the STREAMS overhead for small packets is significant. We report on the benefit of the hardware cache in ameliorating the effect of the second copy, although we note that streaming network data through the cache reduces the level of cache residency seen by the rest of the system. A barrier to achieving low jitter is the non-deterministic nature of many operating system schedulers. We describe the implementation and report on the performance of a kernel streaming driver that allows data to be copied between a network adapter and another I/O device without involving the process scheduler. This provides performance benefits in terms of increased throughput, increased CPU availability and reduced jitter.