Journal of Parallel and Distributed Computing
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
IEEE Transactions on Parallel and Distributed Systems
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Giggle: a framework for constructing scalable replica location services
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Performance Study of Monitoring and Information Services for Distributed Systems
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
The Grid2003 Production Grid: Principles and Practice
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
A Peer-to-Peer Replica Location Service Based on a Distributed Hash Table
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
Experiments with in-transit processing for data intensive grid workflows
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
The reliability analysis of resiliency framework for Grid Services
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Using agreement services in grid computing
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
A grid consists of high-end computational, storage, and network resources that, while known a priori, are dynamic with respect to activity and availability. Efficient scheduling of requests to use grid resources must adapt to this dynamic environment while meeting administrative policies. In this paper, we describe a framework called SPHINX that can administrate grid policies, and schedule complex and data intensive scientific applications. We present experimental results for several scheduling strategies that effectively utilize the monitoring and job-tracking information provided by SPHINX. These results demonstrate that SPHINX can effectively schedule work across a large number of distributed clusters that are owned by multiple units in a virtual organization in a fault-tolerant way in spite of the highly dynamic nature of the grid and complex policy issues. The novelty lies in use of effective monitoring of resources and job execution tracking in making scheduling decisions and fault-tolerance - something that is missed in todayýs grid environments.