Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings

  • Authors:
  • Iman I. Yusuf;Heinz W. Schmidt

  • Affiliations:
  • RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia

  • Venue:
  • Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cloud computing presents a unique opportunity for science and engineering with benefits compared to traditional high-performance computing, especially for smaller compute jobs and entry-level users to parallel computing. However, doubts remain for production high-performance computing in the cloud, the so-called science cloud, as predictable performance, reliability and therefore costs remain elusive for many applications. This paper uses parameterised architectural patterns to assist with fault tolerance and cost predictions for science clouds, in which a single job typically holds many virtual machines for a long time, communication can involve massive data movements, and buffered streams allow parallel processing to proceed while data transfers are still incomplete. We utilise predictive models, simulation and actual runs to estimate run times with acceptable accuracy for two of the most common architectural patterns for data-intensive scientific computing: MapReduce and Combinational Logic. Run times are fundamental to understand fee-for-service costs of clouds. These are typically charged by the hour and the number of compute nodes or cores used. We evaluate our models using realistic cloud experiments from collaborative physics research projects and show that proactive and reactive fault tolerance is manageable, predictable and composable, in principle, especially at the architectural level.