X-Ware Reliability and Availability Modeling
IEEE Transactions on Software Engineering
Hierarchical Markovian models: symmetries and reduction
Performance Evaluation - Special issue: 6th international conference on modelling techniques and tools for computer performance evaluation
Measure-adaptive state-space construction
Performance Evaluation
Architecture-based approach to reliability assessment of software systems
Performance Evaluation
Stochastic Well-Formed Colored Nets and Symmetric Modeling Applications
IEEE Transactions on Computers
Reliability Models for Fault-Tolerant Private Network Applications
IEEE Transactions on Computers
User-Centric Performance Analysis of Market-Based Cluster Batch Schedulers
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Model-Based Evaluation: From Dependability to Security
IEEE Transactions on Dependable and Secure Computing
Basic Ideas for Event-Based Optimization of Markov Systems
Discrete Event Dynamic Systems
Parallel computer workload modeling with markov chains
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Investigating global behavior in computing grids
IWSOS'06/EuroNGI'06 Proceedings of the First international conference, and Proceedings of the Third international conference on New Trends in Network Architectures and Services conference on Self-Organising Systems
Host load prediction in a Google compute cloud with a Bayesian model
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Google hostload prediction based on Bayesian model with optimized feature combination
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In large-scale grid systems with decentralized control, the interactions of many service providers and consumers will likely lead to emergent global system behaviours that result in unpredictable, often detrimental, outcomes. This possibility argues for developing analytical tools to allow understanding, and prediction, of complex system behaviour in order to ensure availability and reliability of grid computing services. This paper presents an approach for using piece-wise homogeneous Discrete Time Markov chains to provide rapid, potentially scalable, simulation of large-scale grid systems. This approach, previously used in other domains, is used here to model dynamics of large-scale grid systems. In this approach, a Markov chain model of a grid system is first represented in a reduced, compact form. This model can then be perturbed to produce alternative system execution paths and identify scenarios in which system performance is likely to degrade or anomalous behaviours occur. The expeditious generation of these scenarios allows prediction of how a larger system will react to failures or high stress conditions. Though computational effort increases in proportion to the number of paths modelled, this cost is shown to be far less than the cost of using detailed simulation or testbeds. Moreover, cost is unaffected by size of system being modelled, expressed in terms of workload and number of computational resources, and is adaptable to systems that are non-homogenous with respect to time. The paper provides detailed examples of the application of this approach.