ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An effective programmable prefetch engine for on-chip caches
Proceedings of the 28th annual international symposium on Microarchitecture
The design and implementation of the 4.4BSD operating system
The design and implementation of the 4.4BSD operating system
Effects of buffering semantics on I/O performance
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
An Efficient Zero-Copy I/O Framework for UNIX
An Efficient Zero-Copy I/O Framework for UNIX
Achieving 10Gbps network processing: are we there yet?
HiPC'08 Proceedings of the 15th international conference on High performance computing
The Journal of Supercomputing
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Bulk data movement occurs commonly in server workloads and their performance is rather poor on today驴s microprocessors. We propose the use of small dedicated copy engines, and present a detailed analysis of a bulk data copy engine architecture. We describe the hardware support required to implement the copy engine and to tightly integrate it into server platforms. Our evaluation is based on an execution driven simulator that was extended with detailed models of bulk data movement engines. The simulation results show that dedicated engines are quite effective in eliminating the data movement overhead and are an attractive choice for handling bulk data in future high performance server platforms.