The AppLeS parameter sweep template: user-level middleware for the grid
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Some simplified NP-complete problems
STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Scheduling tasks sharing files on heterogeneous master-slave platforms
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Hi-index | 0.00 |
Almost every computation task requires input data in order to find a solution. This is not a problem for a centralized system because data is usually available locally. However, in a parallel and distributed system, e.g., computation grids, the data may be in remote sites and must be transferred to the local site before the computation can proceed. As a result, the interleaved sequence of data transfer and job execution has a significant impact on the overall computational efficiency. In this paper, we analyze the computational complexity of the shared-data job scheduling problem on uniprocessor, with and without consideration of the storage capacity constraint on the local site. We show that if there is an upper bound on the server capacity, the problem is NP-complete, even when each job depends on at most two data items. For the case where there is no upper bound on the server capacity, we show that there exists an efficient algorithm that can provide an optimal job schedule when each job depends on at most two data items. We also propose an efficient heuristic algorithm that can determine good schedules for cases where there is no limit on the amount of data a job may access. The reported experiment results demonstrate that this heuristic algorithm performs very well, and derives near optimal solutions.