End-to-End Data-Flow Parallelism for Throughput Optimization in High-Speed Networks

Authors:
Esma Yildirim;Tevfik Kosar
Affiliations:
State University of New York, Buffalo, USA;State University of New York, Buffalo, USA
Venue:
Journal of Grid Computing
Year:
2012

Citing 13
Cited 0

Automatic TCP buffer tuning

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Differentiated end-to-end Internet services using a weighted proportional fair sharing TCP

ACM SIGCOMM Computer Communication Review
Using MPI-2: Advanced Features of the Message Passing Interface

Using MPI-2: Advanced Features of the Message Passing Interface
The End-to-End Performance Effects of Parallel TCP Sockets on a Lossy Wide-Area Network

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Scalable Socket Buffer Tuning for High-Performance Web Servers

ICNP '01 Proceedings of the Ninth International Conference on Network Protocols
Modeling and Taming Parallel TCP on the Wide Area Network

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Globus Striped GridFTP Framework and Server

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Adaptive file transfers for diverse environments

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
A data transfer framework for large-scale science experiments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
GPFS: a shared-disk file system for large computing clusters

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Prediction of Optimal Parallelism Level in Wide Area Data Transfers

IEEE Transactions on Parallel and Distributed Systems
End system optimizations for high-speed TCP

IEEE Communications Magazine
FAST TCP: from theory to experiments

IEEE Network: The Magazine of Global Internetworking

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increase in the data produced by large-scale scientific applications necessitates innovative solutions for efficient transfer of data. Although the current optical networking technology reached theoretical speeds of 100 Gbps, applications still suffer from inefficient transport protocols and bottlenecks on the end-systems (e.g. disk, CPU, NIC). High-performance systems provide us with parallel disks, processors and network interfaces. However the lack of orchestration of these end-system resources with the available network capacity results in underutilization of the network bandwidth. In this study, a model and two algorithms that use `end-to-end data-flow parallelism' to optimize the use of network and end-system resources are proposed. This is achieved by using multiple parallel streams over the network; and multiple parallel disks and CPUs at the end systems. Our model predicts the optimal number of streams and disk/CPU stripes that maximizes the data transfer speed for any setting. Our algorithms use GridFTP parallel samplings and calculate the optimal level of parallelism based on our prediction model. The experiments conducted by using actual GridFTP transfers show that the predictions performed by our model and algorithms provide close-to-optimal performances with negligible overhead and use minimal number of resources. The end-to-end data transfer throughput is improved dramatically in existence of end-system bottlenecks compared to the non-optimized transfers.