A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing

Authors:
Dengpan Yin;Esma Yildirim;Sivakumar Kulasekaran;Brandon Ross;Tevfik Kosar
Affiliations:
Louisiana State University, Baton Rouge;Louisiana State University, Baton Rouge;Louisiana State University, Baton Rouge;Louisiana State University, Baton Rouge;Louisiana State University, Baton Rouge
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2011

Citing 0
Cited 4

Short-Term spatio-temporal forecasts of web performance by means of turning bands method

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
StorkCloud: data transfer scheduling and optimization as a service

Proceedings of the 4th ACM workshop on Scientific cloud computing
Modeling throughput sampling size for a cloud-hosted data scheduling and optimization service

Future Generation Computer Systems
Dynamic protocol tuning algorithms for high performance data transfers

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present the design and implementation of an application-layer data throughput prediction and optimization service for many-task computing in widely distributed environments. This service uses multiple parallel TCP streams to improve the end-to-end throughput of data transfers. A novel mathematical model is developed to determine the number of parallel streams, required to achieve the best network performance. This model can predict the optimal number of parallel streams with as few as three prediction points. We implement this new service in the Stork Data Scheduler, where the prediction points can be obtained using Iperf and GridFTP samplings. Our results show that the prediction cost plus the optimized transfer time is much less than the nonoptimized transfer time in most cases. As a result, Stork data transfer jobs with optimization service can be completed much earlier, compared to nonoptimized data transfer jobs.