New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks

  • Authors:
  • Brian Cho;Indranil Gupta

  • Affiliations:
  • -;-

  • Venue:
  • ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cloud computing is enabling groups of academic collaborators, groups of business partners, etc., to come together in an ad-hoc manner. This paper focuses on the group-based data transfer problem in such settings. Each participant source site in such a group has a large dataset, which may range in size from gigabytes to terabytes. This data needs to be transferred to a single sink site (e.g., AWS, Google datacenters, etc.) in a manner that reduces both total dollar costs incurred by the group as well as the total transfer latency of the collective dataset. This paper is the first to explore the problem of planning a group-based deadline-oriented data transfer in a scenario where data can be sent over both: (1) the internet, and (2) by shipping storage devices (e.g., external or hot-plug drives, or SSDs) via companies such as Fedex, UPS, USPS, etc. We first formalize the problem and prove its NP-Hardness. Then, we propose novel algorithms and use them to build a planning system called Pandora (People and Networks Moving Data Around). Pandora uses new concepts of time-expanded networks and delta-time-expanded networks, combining them with integer programming techniques and optimizations for both shipping and internet edges. Our experimental evaluation using real data from Fedex and from PlanetLab indicate the Pandora planner manages to satisfy deadlines and reduce costs significantly.