Planning spatial workflows to optimize grid performance
Proceedings of the 2006 ACM symposium on Applied computing
Enabling Grid technologies for Planck space mission
Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
A method for job scheduling in Grid based on job execution status
Multiagent and Grid Systems
Load distribution of analytical query workloads for database cluster architectures
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
INFORM: integrated flow orchestration and meta-scheduling for managed grid systems
Proceedings of the 2007 ACM/IFIP/USENIX international conference on Middleware companion
QoS-Oriented Reputation-Aware Query Scheduling in Data Grids
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A new paradigm: Data-aware scheduling in grid computing
Future Generation Computer Systems
Runtime Estimations, Reputation and Elections for Top Performing Distributed Query Scheduling
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
An opportunistic algorithm for scheduling workflows on grids
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
FIRE: A File Reunion Based Data Replication Strategy for Data Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
DECO: data replication and execution CO-scheduling for utility grids
ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Preference---Based Matchmaking of Grid Resources with CP---Nets
Journal of Grid Computing
Hi-index | 0.00 |
Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems such as those encountered in high energy physics, bioinformatics, and other disciplines. These problems typically involve numerous, loosely coupled jobs that both access and generate large data sets. Effective scheduling in such environments is challenging, because of a need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of job scheduling and data movement (replication) algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication on the scheduling strategy, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation of the overall Data Grid system.