On the self-similar nature of Ethernet traffic (extended version)
IEEE/ACM Transactions on Networking (TON)
An introduction to econophysics: correlations and complexity in finance
An introduction to econophysics: correlations and complexity in finance
Analysis and Synthesis of Pseudo-Periodic Job Arrivals in Grids: A Matching Pursuit Approach
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
A five-year study of file-system metadata
ACM Transactions on Storage (TOS)
Future Generation Computer Systems
The Art of Capacity Planning: Scaling Web Resources
The Art of Capacity Planning: Scaling Web Resources
Workload dynamics on clusters and grids
The Journal of Supercomputing
Long-range dependence in a changing Internet traffic mix
Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue: Long range dependent trafic
Power-Law Distributions in Empirical Data
SIAM Review
Modeling job arrivals in a data-intensive grid
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
An SLA-based Broker for Cloud Infrastructures
Journal of Grid Computing
Hi-index | 0.00 |
For years Capacity Planning professionals knew or suspected that various characteristics of computer usage have non-normal distribution. At the same time much of the traditional workload modeling and forecasting is based on mathematical techniques assuming some sort of normality of underlying distributions. If the dissonance between the existing and assumed distribution exists, then resulting capacity models are of lower quality, with possibly erroneous forecasts--and confidence intervals much wider than expected. This paper analyzes distribution of daily resource usage on three storage clusters for 478 days. For each day we consider the distribution of resource usage by customer accounts for five different resources: storage used, storage transactions executed, internal network transfer, egress transfer and inter-data-center transfer--7170 sample distributions in total. All distributions were highly imbalanced and most distribution samples have tails heavier than log-normal, exponential, or normal distributions. These findings spell significant problems for most models assuming normality. Mathematically: Central Limit Theorem does not apply to power-law distributions--so the `averaging' effect cannot be counted on to help with modeling using traditional approach. Operationally: very high volatility found means that the `capacity buffers' need to be large, leading to wasted capacity. Other, administrative, means need to be applied to reduce that. Overall the distributions of resource usage in cloud storage are so far from normal, even after usual transformations, that traditional approach to forecasting and capacity planning needs to be reconsidered. The distributions of log-returns of time series describing resource usage are much more heavy-tailed than similar distributions for stock indexes. Since no financial professional would use linear regression for stock market analysis and forecasting--it stands to reason that capacity planning should move toward employing tools accounting for heavy-tailed distributions, too.