Hardware Support for Accelerating Data Movement in Server Platform

Authors:
Li Zhao;Laxmi N. Bhuyan;Ravi Iyer;Srihari Makineni;Donald Newell
Affiliations:
-;-;-;-;-
Venue:
IEEE Transactions on Computers
Year:
2007

Citing 12
Cited 2

Architectural considerations for a new generation of protocols

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An effective programmable prefetch engine for on-chip caches

Proceedings of the 28th annual international symposium on Microarchitecture
The design and implementation of the 4.4BSD operating system

The design and implementation of the 4.4BSD operating system
Effects of buffering semantics on I/O performance

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
A Case for Intelligent RAM

IEEE Micro
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An Efficient Zero-Copy I/O Framework for UNIX

An Efficient Zero-Copy I/O Framework for UNIX
Architectural Characterization of TCP/IP Packet Processing on the Pentium® M Microprocessor

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
TCP offload is a dumb idea whose time has come

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9

sNICh: efficient last hop networking in the data center

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Analyzing performance and power efficiency of network processing over 10 GbE

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Data movement (memory copies) is a very common operation during network processing and application execution on servers. The performance of this operation is rather poor on today's microprocessors due to the following aspects: 1) Several long-latency memory accesses are involved because the source and/or the destination are typically in memory, 2) latency hiding techniques, such as out-of-order execution, hardware threading, and prefetching, are not very effective for bulk data movement, and 3) microprocessors move data at register (small) granularity. In this paper, we show this overhead of bulk data movement and propose the use of dedicated copy engines to minimize it. We present a detailed analysis of copy engine architectures along two dimensions: 1) on-die versus off-die and 2) synchronous versus asynchronous. These copy engine architectures are superior to traditional Direct Memory Access (DMA) engines because they are tightly coupled to the core architecture and enable lower overhead communication and signaling. We describe the hardware support required to implement these copy engines and integrate them into server platforms. We perform a detailed case study to evaluate the performance of these copy engines. The evaluation is based on an execution-driven simulator, which was extended with detailed models of copy engines. Our simulation results show that copy engines are effective in reducing the bulk data movement overhead and, hence, hold significant promise for high-performance server platforms.