On causes of GridFTP transfer throughput variance

Authors:
Zhengyang Liu;Malathi Veeraraghavan;Jianhui Zhou;Jason Hick;Yee-Ting Li
Affiliations:
University of Virginia, Charlottesville, VA;University of Virginia, Charlottesville, VA;University of Virginia, Charlottesville, VA;Lawrence Berkeley National Laboratory, Berkeley, CA;SLAC National Accelerator Laboratory, Menlo Park, CA
Venue:
NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Year:
2013

Citing 15
Cited 0

Distributed parallel data storage systems: a scalable approach to high speed image servers

MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
The macroscopic behavior of the TCP congestion avoidance algorithm

ACM SIGCOMM Computer Communication Review
Using Disk Throughput Data in Predictions of End-to-End Grid Data Transfers

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Managing Network Resources in Condor

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Using Regression Techniques to Predict Large Data Transfers

International Journal of High Performance Computing Applications
UDT: UDP-based data transfer for high-speed wide area networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
CPU Service Classes for Multimedia Applications

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
On-demand Overlay Networks for Large Scientific Data Transfers

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Just in time: adding value to the IO pipelines of high performance applications with JITStaging

Proceedings of the 20th international symposium on High performance distributed computing
Managed GridFTP

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Software as a service for data scientists

Communications of the ACM
The design and implementation of the KOALA co-allocating grid scheduler

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
End-to-end quality of service for high-end applications

Computer Communications
A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids

Journal of Grid Computing
On using virtual circuits for GridFTP transfers

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In prior work, we analyzed the GridFTP usage logs collected by data transfer nodes (DTNs) located at national scientific computing centers, and found significant throughput variance even among transfers between the same two end hosts. The goal of this work is to quantify the impact of various factors on throughput variance. Our methodology consisted of executing experiments on a high-speed research testbed, running large-sized instrumented transfers between operational DTNs, and creating statistical models from collected measurements. A non-linear regression model for memory-to-memory transfer throughput as a function of CPU usage at the two DTNs and packet loss rate was created. The model is useful for determining concomitant resource allocations to use in scheduling requests. For example, if a whole NERSC DTN CPU core can be assigned to the GridFTP process executing a large memory-to-memory transfer to SLAC, then only 32% of a CPU core is required at the SLAC DTN for the corresponding GridFTP process due to a difference in the computing speeds of these two DTNs. With these CPU allocations, data can be moved at 6.3 Gbps, which sets the rate to request from the circuit scheduler.