A PTS-PGATS based approach for data-intensive scheduling in data grids
Frontiers of Computer Science in China
Hi-index | 0.00 |
In data grid environments data-intensive applications require large amounts of data to execute. Data transfer is a primary cause of job execution delay. In this paper we study smart scheduling integrated with replica management optimization to improve system performance. We study the use of Genetic Algorithm (GA) for the scheduling phase of data-intensive applications. The schedulers proposed incorporate information about the datasets and their replicas needed by the jobs to be scheduled, and co-schedules the jobs and the datasets to the computation node guaranteeing minimum job execution time. We employ a data grid replica management framework for the optimization phase of the replica distribution. In this approach we try to achieve a double optimization effect from both the replica management and the scheduling phases, while integrating scheduling and data replication to improve the performance of the grid system. We evaluate and compare our Genetic Algorithm (GA) with a Tabu search (TS) and the de facto Max-Min based schedulers.