Using Markov chain analysis to study dynamic behaviour in large-scale grid systems

Authors:
Christopher Dabrowski;Fern Hunt
Affiliations:
National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD
Venue:
AusGrid '09 Proceedings of the Seventh Australasian Symposium on Grid Computing and e-Research - Volume 99
Year:
2009

Citing 11
Cited 2

X-Ware Reliability and Availability Modeling

IEEE Transactions on Software Engineering
Hierarchical Markovian models: symmetries and reduction

Performance Evaluation - Special issue: 6th international conference on modelling techniques and tools for computer performance evaluation
Measure-adaptive state-space construction

Performance Evaluation
Architecture-based approach to reliability assessment of software systems

Performance Evaluation
Stochastic Well-Formed Colored Nets and Symmetric Modeling Applications

IEEE Transactions on Computers
Reliability Models for Fault-Tolerant Private Network Applications

IEEE Transactions on Computers
User-Centric Performance Analysis of Market-Based Cluster Batch Schedulers

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Model-Based Evaluation: From Dependability to Security

IEEE Transactions on Dependable and Secure Computing
Basic Ideas for Event-Based Optimization of Markov Systems

Discrete Event Dynamic Systems
Parallel computer workload modeling with markov chains

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Investigating global behavior in computing grids

IWSOS'06/EuroNGI'06 Proceedings of the First international conference, and Proceedings of the Third international conference on New Trends in Network Architectures and Services conference on Self-Organising Systems

Host load prediction in a Google compute cloud with a Bayesian model

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Google hostload prediction based on Bayesian model with optimized feature combination

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In large-scale grid systems with decentralized control, the interactions of many service providers and consumers will likely lead to emergent global system behaviours that result in unpredictable, often detrimental, outcomes. This possibility argues for developing analytical tools to allow understanding, and prediction, of complex system behaviour in order to ensure availability and reliability of grid computing services. This paper presents an approach for using piece-wise homogeneous Discrete Time Markov chains to provide rapid, potentially scalable, simulation of large-scale grid systems. This approach, previously used in other domains, is used here to model dynamics of large-scale grid systems. In this approach, a Markov chain model of a grid system is first represented in a reduced, compact form. This model can then be perturbed to produce alternative system execution paths and identify scenarios in which system performance is likely to degrade or anomalous behaviours occur. The expeditious generation of these scenarios allows prediction of how a larger system will react to failures or high stress conditions. Though computational effort increases in proportion to the number of paths modelled, this cost is shown to be far less than the cost of using detailed simulation or testbeds. Moreover, cost is unaffected by size of system being modelled, expressed in terms of workload and number of computational resources, and is adaptable to systems that are non-homogenous with respect to time. The paper provides detailed examples of the application of this approach.