Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System
IEEE Transactions on Software Engineering
Software Fault Tolerance
A taxonomy and survey of grid resource management systems for distributed computing
Software—Practice & Experience
Analysis of finite-buffer multi-server queues with group arrivals: GIX/M/c/N
Queueing Systems: Theory and Applications
Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture
IEEE Transactions on Computers
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
A resource management and fault tolerance services in grid computing
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Introduction to Probability Models, Ninth Edition
Introduction to Probability Models, Ninth Edition
Efficient task replication and management for adaptive fault tolerance in mobile Grid environments
Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
A Hierarchical Modeling and Analysis for Grid Service Reliability
IEEE Transactions on Computers
IEEE Transactions on Computers
Error recovery mechanism for grid-based workflow within SLA context
International Journal of High Performance Computing and Networking
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Computers and Industrial Engineering
Cloud Computing: Distributed Internet Computing for IT and Scientific Research
IEEE Internet Computing
Harnessing Cloud Technologies for a Virtualized Distributed Computing Infrastructure
IEEE Internet Computing
Adaptive checkpointing strategy to tolerate faults in economy based grid
The Journal of Supercomputing
Performance Evaluation of Cloud Service Considering Fault Recovery
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Communications of the ACM
Hi-index | 0.00 |
Cloud computing is a recent trend in IT, which has attracted lots of attention. In cloud computing, service reliability and service performance are two important issues. To improve cloud service reliability, fault tolerance techniques such as fault recovery may be used, which in turn has impact on cloud service performance. Such impact deserves detailed research. Although there exist some researches on cloud/grid service reliability and performance, very few of them addressed the issues of fault recovery and its impact on service performance. In this paper, we conduct detailed research on performance evaluation of cloud service considering fault recovery. We consider recovery on both processing nodes and communication links. The commonly adopted assumption of Poisson arrivals of users' service requests is relaxed, and the interarrival times of service requests can take arbitrary probability distribution. The precedence constraints of subtasks are also considered. The probability distribution of service response time is derived, and a numerical example is presented. The proposed cloud performance evaluation models and methods could yield results which are realistic, and thus are of practical value for related decision-makings in cloud computing.