Communications of the ACM
Protocol implementation on the Nectar Communication Processor
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
The performance of message-passing using restricted virtual memory remapping
Software—Practice & Experience
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Experiences with a high-speed network adaptor: a software perspective
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
The integration of virtual memory management and interprocess communication in Accent
ACM Transactions on Computer Systems (TOCS)
Software support for outboard buffering and checksumming
SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Applying architectural parallelism in high-performance network subsystems
Applying architectural parallelism in high-performance network subsystems
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Cost-effective streaming server implementation using Hi-tactix
Proceedings of the tenth ACM international conference on Multimedia
Improving Web Server Performance by Network Aware Data Buffering and Caching
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Structure and Performance of the Direct Access File System
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
The influence of ATM on operating systems
ACM SIGCOMM Computer Communication Review
Firmware-Level Latency Analysis on a Gigabit Network
The Journal of Supercomputing
Copy Emulation in Checksummed, Multiple-Packet Communication
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
MANTIS: system support for multimodAl NeTworks of in-situ sensors
WSNA '03 Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications
NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Engineering a user-level TCP for the CLAN network
NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Application performance on the Direct Access File System
WOSP '04 Proceedings of the 4th international workshop on Software and performance
Internet Protocol storage area networks
IBM Systems Journal
Experiences in Design and Implementation of a High Performance Transport Protocol
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Design and Implementation of Open MPI over Quadrics/Elan4
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Storage Over IP: When Does Hardware Support Help?
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Performance analysis of TLS Web servers
ACM Transactions on Computer Systems (TOCS)
Lazy direct-to-cache transfer during receive operations in a message passing environment
Proceedings of the 3rd conference on Computing frontiers
LyraNET: A zero-copy TCP/IP protocol stack for embedded systems
Real-Time Systems
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
MANTIS OS: an embedded multithreaded operating system for wireless micro sensor platforms
Mobile Networks and Applications
Performance of optimized software implementation of the iSCSI protocol
SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
On modelling and analysis of receive livelock and CPU utilization in high-speed networks
International Journal of Computers and Applications
UDT: UDP-based data transfer for high-speed wide area networks
Computer Networks: The International Journal of Computer and Telecommunications Networking
Self-prevention of socket buffer overflow
Computer Networks: The International Journal of Computer and Telecommunications Networking
A portable kernel abstraction for low-overhead ephemeral mapping management
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
A scalable and high performance software iSCSI implementation
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
The Journal of Supercomputing
Cheating the I/O bottleneck: network storage with Trapeze/Myrinet
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
High-performance local area communication with fast sockets
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Trapeze/IP: TCP/IP at near-gigabit speeds
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Melange: creating a "functional" internet
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Connection handoff policies for TCP offload network interfaces
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Performance analysis and comparison of interrupt-handling schemes in gigabit networks
Computer Communications
SEPCOM: customizable zero copy model
Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
International Journal of High Performance Computing and Networking
Optimizing TCP receive performance
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Highly scalable web applications with zero-copy data transfer
Proceedings of the 18th international conference on World wide web
Microprocessors & Microsystems
Fast-path I/O architecture for high performance streaming server
The Journal of Supercomputing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
High-performance message-passing over generic Ethernet hardware with Open-MX
Parallel Computing
Storage over IP: when does hardware support help
FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Improving per-node efficiency in the datacenter with new OS abstractions
Proceedings of the 2nd ACM Symposium on Cloud Computing
A case for RDMA in clouds: turning supercomputer networking into commodity
Proceedings of the Second Asia-Pacific Workshop on Systems
Lightweight messages: true zero-copy communication for commodity gigabit ethernet
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
On the accuracy of two analytical models for evaluating the performance of Gigabit Ethernet hosts
Information Sciences: an International Journal
Affinity-aware DMA buffer management for reducing off-chip memory access
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Operating system support for multimedia systems
Computer Communications
Revisiting software zero-copy for web-caching applications with twin memory allocation
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
This paper describes a new feature in Solaris that uses virtual memory remapping combined with checksumming support from the networking hardware, to eliminate data-touching overhead from the TCP/IP protocol stack. By implementing page remapping operations at the right level of the operating system, and caching MMU mappings to take advantage of locality of reference, significant performance gain is attained on certain hardware platforms. Nevertheless, the performance improvement over CPU copying varies, depending on the host memory cache architecture, MMU design, and application behavior. We begin by comparing different zero-copy schemes, and explain our preference for page remapping and copy-on-write (COW) techniques. We then describe our implementation, and present its performance characteristics under a number of different parameters. We conclude with ideas for future improvements.