Cloud Resource Usage--Heavy Tailed Distributions Invalidating Traditional Capacity Planning Models

  • Authors:
  • Charles Loboz

  • Affiliations:
  • Microsoft Corporation, Windows Azure, Redmond, USA

  • Venue:
  • Journal of Grid Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

For years Capacity Planning professionals knew or suspected that various characteristics of computer usage have non-normal distribution. At the same time much of the traditional workload modeling and forecasting is based on mathematical techniques assuming some sort of normality of underlying distributions. If the dissonance between the existing and assumed distribution exists, then resulting capacity models are of lower quality, with possibly erroneous forecasts--and confidence intervals much wider than expected. This paper analyzes distribution of daily resource usage on three storage clusters for 478 days. For each day we consider the distribution of resource usage by customer accounts for five different resources: storage used, storage transactions executed, internal network transfer, egress transfer and inter-data-center transfer--7170 sample distributions in total. All distributions were highly imbalanced and most distribution samples have tails heavier than log-normal, exponential, or normal distributions. These findings spell significant problems for most models assuming normality. Mathematically: Central Limit Theorem does not apply to power-law distributions--so the `averaging' effect cannot be counted on to help with modeling using traditional approach. Operationally: very high volatility found means that the `capacity buffers' need to be large, leading to wasted capacity. Other, administrative, means need to be applied to reduce that. Overall the distributions of resource usage in cloud storage are so far from normal, even after usual transformations, that traditional approach to forecasting and capacity planning needs to be reconsidered. The distributions of log-returns of time series describing resource usage are much more heavy-tailed than similar distributions for stock indexes. Since no financial professional would use linear regression for stock market analysis and forecasting--it stands to reason that capacity planning should move toward employing tools accounting for heavy-tailed distributions, too.