Predictable quality of service atop degradable distributed systems

  • Authors:
  • Lavanya Ramakrishnan;Daniel A. Reed

  • Affiliations:
  • Indiana University, Bloomington, USA;Microsoft Research, Redmond, USA

  • Venue:
  • Cluster Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

High performance and distributed computing systems such as peta-scale, grid and cloud infrastructure are increasingly used for running scientific models and business services. These systems experience large availability variations through hardware and software failures. Resource providers need to account for these variations while providing the required QoS at appropriate costs in dynamic resource and application environments. Although the performance and reliability of these systems have been studied separately, there has been little analysis of the lost Quality of Service (QoS) experienced with varying availability levels. In this paper, we present a resource performability model to estimate lost performance and corresponding cost considerations with varying availability levels. We use the resulting model in a multi-phase planning approach for scheduling a set of deadline-sensitive meteorological workflows atop grid and cloud resources to trade-off performance, reliability and cost. We use simulation results driven by failure data collected over the lifetime of high performance systems to demonstrate how the proposed scheme better accounts for resource availability.