A data locality aware online scheduling approach for I/O-intensive jobs with file sharing

  • Authors:
  • Gaurav Khanna;Umit Catalyurek;Tahsin Kurc;P. Sadayappan;Joel Saltz

  • Affiliations:
  • Dept. of Computer Science and Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Computer Science and Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University

  • Venue:
  • JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

Many scientific investigations have to deal with large amounts of data from simulations and experiments. Data analysis in such investigations typically involves extraction of subsets of data, followed by computations performed on extracted data. Scheduling in this context requires efficient utilization of the computational, storage and network resources to optimize response time. The data-intensive nature of such applications necessitates data-locality aware job scheduling algorithms. This paper proposes a hypergraph based dynamic scheduling heuristic for a stream of independent I/O intensive jobs with file sharing behavior. The proposed heuristic is based on an event-driven, run-time hypergraph modeling of the file sharing characteristics among jobs. Our experiments on a coupled compute/storage cluster show it performs better compared to previously proposed strategies, under a varying set of parameters for workloads from the application domain of biomedical image analysis.