A data transfer framework for large-scale science experiments

  • Authors:
  • Wantao Liu;Brian Tieman;Rajkumar Kettimuthu;Ian Foster

  • Affiliations:
  • Beihang University, Beijing, China and The University of Chicago, Chicago, IL;Argonne National Laboratory, Argonne, IL;Argonne National Laboratory, Argonne, IL and The University of Chicago, Chicago, IL;The University of Chicago, Chicago, IL and Argonne National Laboratory, Argonne, IL

  • Venue:
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern scientific experiments can generate hundreds of gigabytes to terabytes or even petabytes of data that may furthermore be maintained in large numbers of relatively small files. Frequently, this data must be disseminated to remote collaborators or computational centers for data analysis. Moving this data with high performance and strong robustness and providing a simple interface for users are challenging tasks. We present a data transfer framework comprising a high-performance data transfer library based on GridFTP, a data scheduler, and a graphical user interface that allows users to transfer their data easily, reliably, and securely. This system incorporates automatic tuning mechanisms to select at runtime the number of concurrent threads to be used for transfers. Also included are restart mechanisms capable of dealing with client, network, and server failures. Experimental results indicate that our data transfer system can significantly improve data transfer performance and can recover well from failures.