Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance
IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Distributed Computing in a Heterogeneous Computing Environment
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Performance Study of Monitoring and Information Services for Distributed Systems
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
The workload on parallel supercomputers: modeling the characteristics of rigid jobs
Journal of Parallel and Distributed Computing
International Journal of High Performance Computing Applications
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Mobile MPI programs in computational grids
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Predicting bounds on queuing delay for batch-scheduled parallel machines
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Methods of inference and learning for performance modeling of parallel applications
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling Policies for Processor Coallocation in Multicluster Systems
IEEE Transactions on Parallel and Distributed Systems
A Framework for Executing Long Running Jobs in Grid Environments
HPCS '08 Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications
Scheduling malleable applications in multicluster systems
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Autonomic management of application workflows on hybrid computing infrastructure
Scientific Programming - Science-Driven Cloud Computing
MPICH-GP: a Private-IP-Enabled MPI over grid environments
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.