Globus XIO pipe open driver: enabling GridFTP to leverage standard Unix tools

  • Authors:
  • Rajkumar Kettimuthu;Steven Link;John Bresnahan;Michael Link;Ian Foster

  • Affiliations:
  • Computation Institute, Argonne National Lab. & U. Chicago, Argonne, IL;Northern Illinois University, DeKalb, IL;Computation Institute, Argonne National Lab. & U. Chicago, Argonne, IL;Computation Institute, Argonne National Lab. & U. Chicago, Argonne, IL;Computation Institute, Argonne National Lab. & U. Chicago, Argonne, IL and University of Chicago, Chicago, IL

  • Venue:
  • Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific research creates substantially large volumes of data throughout the processes of discovery and analysis. Given the necessity for data sharing and data relocation, members of the scientific community are often faced with a productivity loss that correlates with the time cost incurred during the data transfer process. The GridFTP protocol was developed to improve this situation by addressing the performance, reliability, and security limitations of standard FTP and other commonly used data movement tools such as SCP. The Globus implementation of GridFTP is widely used to rapidly and reliably move data between geographically distributed systems. Traditionally, GridFTP performs well for datasets containing large files. When the data is partitioned into many small files, however, it suffers from lower transfer rates. Although the pipelining and concurrency solution in GridFTP provides improved transfer rates for datasets using lots-of-small-files, these solutions cannot be applied in environments that have strict firewall rules. In some cases, tarring the files in a dataset on the fly will help; in other cases, a checksum of the files after they are written to disk is desired. In this paper, we present the Globus XIO Pipe Open Driver which enables GridFTP to leverage the standard Unix tools to perform these tasks. We demonstrate the effectiveness of this functionality through several experiments.