Scheduling file transfers for data-intensive jobs on heterogeneous clusters

  • Authors:
  • Gaurav Khanna;Umit Catalyurek;Tahsin Kurc;P. Sadayappan;Joel Saltz

  • Affiliations:
  • Dept. of Computer Science and Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Computer Science and Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University

  • Venue:
  • Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of efficient collective scheduling of file transfers requested by a batch of tasks. Our work targets a heterogeneous collection of storage and compute clusters. The goal is to minimize the overall time to transfer files to their respective destination nodes. Two scheduling schemes are proposed and experimentally evaluated against an existing approach, the Insertion Scheduling. The first is a 0-1 Integer Programming based approach which is based on the idea of time-expanded networks. This scheme achieves the minimum total file transfer time, but has significant scheduling overhead. To address this issue, we propose a maximum weight graph matching based heuristic approach. This scheme is able to perform as well as insertion scheduling and has much lower scheduling overhead. We conclude that the heuristic scheme is a better fit for larger workloads and systems.