A Contribution Towards Solving the Web Workload Puzzle

  • Authors:
  • Katerina Goseva-Popstojanova;Fengbin Li;Xuan Wang;Amit Sangle

  • Affiliations:
  • West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV;West Virginia University, Morgantown, WV

  • Venue:
  • DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

World Wide Web, the biggest distributed system ever built, experiences tremendous growth and change in Web sites, users, and technology. A realistic and accurate characterization of Web workload is the first, fundamental step in areas such as performance analysis and prediction, capacity planning, and admission control. Compared to the previous work, in this paper we present more detailed and rigorous statistical analysis of both request and session level characteristics of Web workload based on empirical data extracted from actual logs of four Web servers. Our analysis is focused on exploring phenomena such as self-similarity, long-range dependence, and heavy-tailed distributions. Identification of these phenomena in real data is a challenging task since the existing methods may perform erratically in practice and produce misleading results. We provide more accurate analysis of long-range dependence of the request and session arrival processes by removing the trend and periodicity. In addition to the session arrival process (i.e., inter-session characteristics), we study several intra-session characteristics using several different methods to test the existence of heavy-tailed behavior and cross validate the results. Finally, we point out specific problems associated with the methods used for establishing long-range dependence and heavy-tailed behavior of Web workloads. We believe that the comprehensive model presented in this paper is a step towards solving the Web workload puzzle.