Architectural considerations for a new generation of protocols
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
TCP offload is a dumb idea whose time has come
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
End system optimizations for high-speed TCP
IEEE Communications Magazine
Performance Analysis of System Overheads in TCP/IP Workloads
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Lazy direct-to-cache transfer during receive operations in a message passing environment
Proceedings of the 3rd conference on Computing frontiers
Integrated network interfaces for high-bandwidth TCP/IP
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Evaluating network processing efficiency with processor partitioning and asynchronous I/O
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Characterization of network processing overheads in Xen
VTDC '06 Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing
Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
Software techniques to improve virtualized I/O performance on multi-core systems
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Achieving 10 Gb/s using safe and transparent network interface virtualization
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
NIC-Assisted Cache-Efficient Receive Stack for Message Passing over Ethernet
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
Microprocessors & Microsystems
Instruction-level simulation of a cluster at scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Architectural breakdown of end-to-end latency in a TCP/IP network
International Journal of Parallel Programming - Special issue on the 19th international symposium on computer architecture and high performance computing (SBAC-PAD 2007)
Achieving 10Gbps network processing: are we there yet?
HiPC'08 Proceedings of the 15th international conference on High performance computing
A new TCB cache to efficiently manage TCP sessions for web servers
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
sNICh: efficient last hop networking in the data center
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Cost-effectively offering private buffers in SoCs and CMPs
Proceedings of the international conference on Supercomputing
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Introspective end-system modeling to optimize the transfer time of rate based protocols
Proceedings of the 20th international symposium on High performance distributed computing
Receive side coalescing for accelerating TCP/IP processing
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Affinity-aware DMA buffer management for reducing off-chip memory access
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Improving the throughput and delay performance of network processors by applying push model
Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service
Analyzing performance and power efficiency of network processing over 10 GbE
Journal of Parallel and Distributed Computing
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Cache-aware affinitization on commodity multicores for high-speed network flows
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework
Journal of Parallel and Distributed Computing
Thin servers with smart pipes: designing SoC accelerators for memcached
Proceedings of the 40th Annual International Symposium on Computer Architecture
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows
NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Network interface design for low latency request-response protocols
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
Recent I/O technologies such as PCI-Express and 10Gb Ethernet enable unprecedented levels of I/O bandwidths in mainstream platforms. However, in traditional architectures, memory latency alone can limit processors from matching 10 Gb inbound network I/O traffic. We propose a platform-wide method called Direct Cache Access (DCA) to deliver inbound I/O data directly into processor caches. We demonstrate that DCA provides a significant reduction in memory latency and memory bandwidth for receive intensive network I/O applications. Analysis of benchmarks such as SPECWeb9, TPC-W and TPC-C shows that overall benefit depends on the relative volume of I/O to memory traffic as well as the spatial and temporal relationship between processor and I/O memory accesses. A system level perspective for the efficient implementation of DCA is presented.