Using overlays for efficient data transfer over shared wide-area networks

Authors:
Gaurav Khanna;Umit Catalyurek;Tahsin Kurc;Rajkumar Kettimuthu;P. Sadayappan;Ian Foster;Joel Saltz
Affiliations:
The Ohio State University;The Ohio State University;The Ohio State University;Argonne National Laboratory;The Ohio State University;Argonne National Laboratory;The Ohio State University
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 16
Cited 6

Distributed parallel data storage systems: a scalable approach to high speed image servers

MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
M-TCP: TCP for mobile cellular networks

ACM SIGCOMM Computer Communication Review
Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors

Journal of the ACM (JACM)
Detecting shared congestion of flows via end-to-end measurement

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Dynamically forecasting network performance using the Network Weather Service

Cluster Computing
A Network-Aware Distributed Storage Cache for Data Intensive Environments

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Predicting the Performance of Wide Area Data Transfers

IPDPS '02 Proceedings of the 16th International Symposium on Parallel and Distributed Processing
I-TCP: indirect TCP for mobile hosts

ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Stork: Making Data Placement a First Class Citizen in the Grid

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
A wavelet-based approach to detect shared congestion

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Improving Throughput for Grid Applications with Network Logistics

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Globus Striped GridFTP Framework and Server

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Improving parallel data transfer times using predicted variances in shared networks

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
A GridFTP Overlay Network Service

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
Scheduling file transfers for data-intensive jobs on heterogeneous clusters

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

On-demand Overlay Networks for Large Scientific Data Transfers

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Phoebus: A system for high throughput data movement

Journal of Parallel and Distributed Computing
Budget-constrained bulk data transfer via internet and shipping networks

Proceedings of the 8th ACM international conference on Autonomic computing
StorkCloud: data transfer scheduling and optimization as a service

Proceedings of the 4th ACM workshop on Scientific cloud computing
Modeling throughput sampling size for a cloud-hosted data scheduling and optimization service

Future Generation Computer Systems
Dynamic protocol tuning algorithms for high performance data transfers

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data-intensive applications frequently transfer large amounts of data over wide-area networks. The performance achieved in such settings can often be improved by routing data via intermediate nodes chosen to increase aggregate bandwidth. We explore the benefits of overlay network approaches by designing and implementing a service-oriented architecture that incorporates two key optimizations -- multi-hop path splitting and multi-pathing - within the GridFTP file transfer protocol. We develop a file transfer scheduling algorithm that incorporates the two optimizations in conjunction with the use of available file replicas. The algorithm makes use of information from past GridFTP transfers to estimate network bandwidths and resource availability. The effectiveness of these optimizations is evaluated using several application file transfer patterns: one-to-all broadcast, all-to-one gather, and data redistribution, on a wide-area testbed. The experimental results show that our architecture and algorithm achieve significant performance improvement.