IEEE/ACM Transactions on Networking (TON)
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Impact of Cache Coherence Protocols on the Processing of Network Traffic
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
An Improved Method of Zero-Copy Data Transmission in the High Speed Network Environment
MINES '09 Proceedings of the 2009 International Conference on Multimedia Information Networking and Security - Volume 02
Reinventing scheduling for multicore systems
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Hi-index | 0.00 |
It is well recognized that moving I/O data in/out memory has become critical for high bandwidth devices. Specifically, the embedded system, with limited cache size and simple architecture, consumes a large amount of CPU cycles for off-chip memory access. The work presented in this paper addresses this problem through an Affinity-aware DMA Buffer management strategy, called ADB, requiring no change to underlying hardware. We introduce the concept of buffer affinity describes the data location of the recently released DMA buffer in the memory hierarchy. The more data in cache, the higher affinity the buffer has. Based on the character of the embedded system, we can identify buffer affinity at runtime. Using this online profiling, ADB allocates buffer with different affinity. For output processes, ADB allocates the high affinity buffer to reduce off-chip memory access when OS copies data from the user buffer to the kernel buffer. For input processes, ADB allocates the low affinity buffer to skip part of cache invalidation operations for maintaining I/O coherence. Measurements show that ADB, implemented in the Linux-2.6.32 kernel and running on a 1GHz UniCore-2 processor, improves the performance of network related programs from 5.2% to 8.8%.