The AppLeS parameter sweep template: user-level middleware for the grid
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Evaluation of an Economy-Based File Replication Strategy for a Data Grid
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Evaluating Scheduling and Replica Optimisation Strategies in OptorSim
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Computation scheduling and data replication algorithms for data Grids
Grid resource management
The Grid2003 Production Grid: Principles and Practice
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
An evaluation of the close-to-files processor and data co-allocation policy in multiclusters
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Integration of scheduling and replication in data grids
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
GridRod: a dynamic runtime scheduler for grid workflows
Proceedings of the 21st annual international conference on Supercomputing
FIRE: A File Reunion Based Data Replication Strategy for Data Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Proximity-Based Self-Organizing Framework for Service Composition and Discovery
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Towards optimising distributed data streaming graphs using parallel streams
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
PFRF: An adaptive data replication algorithm based on star-topology data grids
Future Generation Computer Systems
Workflow Scheduling to Minimize Data Movement Using Multi-constraint Graph Partitioning
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A self-organizing P2P framework for collective service discovery
Journal of Network and Computer Applications
Hi-index | 0.00 |
In many scientific workflows, particularly those that operate on spatially oriented data, jobs that process adjacent regions of space often reference large numbers of files in common. Such workflows, when processed using workflow planning algorithms that are unaware of the application's file reference pattern, result in a huge number of redundant file transfers between grid sites and consequently perform poorly. This work presents a generalized approach to planning spatial workflow schedules for Grid execution based on the spatial proximity of files and the spatial range of jobs. We evaluate our solution to this problem using the file access pattern of an astronomy application that performs co-addition of images from the Sloan Digital Sky Survey. We show that, in initial tests on Grids of 5 to 25 sites, our spatial clustering approach eliminates 50% to 90% of the file transfers between Grid sites relative to the next-best planning algorithms we tested that were not "spatially aware". At moderate levels of concurrent file transfer, this reduction of redundant network I/O improves the application execution time by 30% to 70%, reduces Grid network and storage overhead and is broadly applicable to a wide range of spatially-oriented problems.