Scalable and distributed mechanisms for integrated scheduling and replication in data grids

  • Authors:
  • Anirban Chakrabarti;Shubhashis Sengupta

  • Affiliations:
  • Software Engineering and Technology Laboratory, Infosys Technologies Ltd., Bangalore, India;Software Engineering and Technology Laboratory, Infosys Technologies Ltd., Bangalore, India

  • Venue:
  • ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems. The issues that need to be considered in the Data Grid research area include resource management for computation and data. Computation management comprises scheduling of jobs, load balancing, fault tolerance and response time; while data management includes replication and movement of data at selected sites. As jobs are data intensive, data management issues often become integral to the problems of scheduling and effective resource management in the Data Grids. Therefore, integration of data replication and scheduling strategies is important. Such an integrating solution is either non-existent or work in a centralized manner which is not scalable. The paper deals with the problem of integrating the scheduling and replication strategies in a distributed manner. As part of the solution, we have proposed a Distributed Replication and Scheduling Strategy (DistReSS) which aims at an iterative improvement of the performance based on coupling between scheduling and replication, which is achieved in distributed and hierarchical fashion. Results suggest that, in the context of our experiments, DistReSS performs comparable to the centralized approach when the parameters are tuned properly in addition to being more scalable to the centralized approach.