SLA success probability assessment in networks with correlated failures

  • Authors:
  • Andres J. Gonzalez;Bjarne E. Helvik

  • Affiliations:
  • Centre for Quantifiable Quality of Service in Communication Systems, Norwegian University of Science and Technology, O.S. Bragstads Plass 2E, N-7491 Trondheim, Norway;Centre for Quantifiable Quality of Service in Communication Systems, Norwegian University of Science and Technology, O.S. Bragstads Plass 2E, N-7491 Trondheim, Norway

  • Venue:
  • Computer Communications
  • Year:
  • 2013

Quantified Score

Hi-index 0.24

Visualization

Abstract

Service Level Agreements (SLAs) are used to define obligations between network/service providers and customers in business relationships. The terms that define the guaranteed availability for a given period are fundamental to these contracts. The appropriate selection of the availability to be promised is still an open challenge for network operators due to: (i) SLAs are defined for finite periods, and hence the stochastic properties of the availability have to be considered. (ii) Real operational networks have not the Markovian properties. (iii) The way that correlation affects the interval availability in operational networks is unknown. In this work, we show the impact of dependent failures on SLAs, based on operational failure data obtained from the UNINETT network. Using these data, we simulate the behavior of network connections that use shared backup protection. We evaluate the SLA success probability using two different methods. First, we apply trace driven simulation combined with random circular shifting. Second, we develop a model that uses Monte Carlo techniques. This approach includes the characterization of up and down times of each network component and the use of a model that generates correlated samples based on fitted marginal distributions. Finally, we analyze the probability density function of the interval availability for different observation periods under independent and correlated failures.