Data pipelines: enabling large scale multi-protocol data transfers

  • Authors:
  • Tevfik Kosar;George Kola;Miron Livny

  • Affiliations:
  • University of Wisconsin-Madison, Madison WI;University of Wisconsin-Madison, Madison WI;University of Wisconsin-Madison, Madison WI

  • Venue:
  • MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collaborating users need to move terabytes of data among their sites, often involving multiple protocols. This process is very fragile and involves considerable human involvement to deal with failures. In this work, we propose data pipelines, an automated system for transferring data among collaborating sites. It speaks multiple protocols, has sophisticated flow control and recovers automatically from network, storage system, software and hardware failures. We successfully used data pipelines to transfer three terabytes of DPOSS data from SRB mass storage server at San Diego Supercomputing Center to UniTree mass storage at NCSA. The whole process did not require any human intervention and the data pipeline recovered automatically from various network, storage system, software and hardware failures.