A Methodology for Account Management in Grid Computing Environments
GRID '01 Proceedings of the Second International Workshop on Grid Computing
Job Scheduling Under the Portable Batch System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Special Issue: Science Gateways—Common Community Interfaces to Grid Resources: Editorials
Concurrency and Computation: Practice & Experience - Science Gateways—Common Community Interfaces to Grid Resources
Using queue structures to improve job reliability
Proceedings of the 16th international symposium on High performance distributed computing
Insensitive Traffic Models for Communication Networks
Discrete Event Dynamic Systems
An analysis of clustered failures on large supercomputing systems
Journal of Parallel and Distributed Computing
Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters
Proceedings of the 18th ACM international symposium on High performance distributed computing
Virtual Infrastructure Management in Private and Hybrid Clouds
IEEE Internet Computing
System Modeling and Analysis: Foundations of System Performance Evaluation
System Modeling and Analysis: Foundations of System Performance Evaluation
AMREF: An Adaptive MapReduce Framework for Real Time Applications
GCC '10 Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing
Live Migration of Parallel Applications with OpenVZ
WAINA '11 Proceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications
The NEEShub Cyberinfrastructure for Earthquake Engineering
Computing in Science and Engineering
Workload characteristics of a multi-cluster supercomputer
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Modeling machine availability in enterprise and wide-area distributed computing environments
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Two common properties of the erlang-B function, erlang-C function, and Engset blocking function
Mathematical and Computer Modelling: An International Journal
An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Virtualization and cloud computing technologies now make it possible to create scalable and reliable virtual high performance computing clusters. Integrating these technologies, however, is complicated by fundamental and inherent differences in the way in which these systems allocate resources to computational tasks. Cloud computing systems immediately allocate available resources or deny requests. In contrast, parallel computing systems route all requests through a queue for future resource allocation. This divergence of allocation policies hinders efforts to implement efficient, responsive, and reliable virtual clusters. In this paper, we present a continuum of four scheduling polices along with an analytical resource prediction model for each policy to estimate the level of resources needed to operate an efficient, responsive, and reliable virtual cluster system. We show that it is possible to estimate the size of the virtual cluster system needed to provide a predictable grade of service for a realistic high performance computing workload and estimate the queue wait time for a partial or full resource allocation. Moreover, we show that it is possible to provide a reliable virtual cluster system using a limited pool of spare resources. The models and results we present are useful for cloud computing providers seeking to operate efficient and cost-effective virtual cluster systems.