An EM-based technique for approximating long-tailed data sets with PH distributions

  • Authors:
  • Alma Riska;Vesselin Diev;Evgenia Smirni

  • Affiliations:
  • Seagate Research, 1251 Waterfront Place, Pittsburgh, PA;Department of Computer Science, College of William and Mary, Williamsburg, VA;Department of Computer Science, College of William and Mary, Williamsburg, VA

  • Venue:
  • Performance Evaluation - Internet performance symposium (IPS 2002)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new technique for fitting long-tailed data sets into phase-type (PH) distributions. This technique fits data sets with non-monotone densities into a mixture of Erlang and hyperexponential distributions, and data sets with completely monotone densities into hyperexponenfial distributions. The method partitions the data set in a divide-and-conquer fashion and uses the expectation-maximization (EM) algorithm to fit the data of each partition into a hyperexponential distribution. The fits of all partitions are combined to generate the final fit of the entire data set. The proposed method is accurate and computationally efficient. Furthermore, it allows one to apply existing analytic tools to analyze the behavior of queuing systems with long-tailed arrival and/or service processes via tractable models.