Computation and communication schedule optimization for data-sharing tasks on uniprocessor

Authors:
Jan-Jan Wu;En-Jan Chou;Pangfeng Liu
Affiliations:
Institute of Information Science, Academia Sinica, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2009

Citing 8
Cited 0

The AppLeS parameter sweep template: user-level middleware for the grid

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Some simplified NP-complete problems

STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Scheduling tasks sharing files on heterogeneous master-slave platforms

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Almost every computation task requires input data in order to find a solution. This is not a problem for a centralized system because data is usually available locally. However, in a parallel and distributed system, e.g., computation grids, the data may be in remote sites and must be transferred to the local site before the computation can proceed. As a result, the interleaved sequence of data transfer and job execution has a significant impact on the overall computational efficiency. In this paper, we analyze the computational complexity of the shared-data job scheduling problem on uniprocessor, with and without consideration of the storage capacity constraint on the local site. We show that if there is an upper bound on the server capacity, the problem is NP-complete, even when each job depends on at most two data items. For the case where there is no upper bound on the server capacity, we show that there exists an efficient algorithm that can provide an optimal job schedule when each job depends on at most two data items. We also propose an efficient heuristic algorithm that can determine good schedules for cases where there is no limit on the amount of data a job may access. The reported experiment results demonstrate that this heuristic algorithm performs very well, and derives near optimal solutions.