Web traffic modeling at finer time scales and performance implications

  • Authors:
  • Cathy H. Xia;Zhen Liu;Mark S. Squillante;Li Zhang;Naceur Malouch

  • Affiliations:
  • IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA;IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA;Laboratoire LIP6-CNRS, Université Pierre et Marie Curie, 8 rue du capitaine Scott, 75015 Paris, France

  • Venue:
  • Performance Evaluation - Long range dependence and heavy tail distributions
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of Web sites continues to be an important research topic. Such studies are invariably based on the access logs from the servers comprising the Web site. A problem with existing access logs is the coarse granularity of the timestamps, e.g., arrival times. In this study we demonstrate and quantify the significant differences in performance obtained under diverse assumptions about the arrival process of user requests derived from the access logs, where the corresponding user response times can differ by more than an order of magnitude. This motivates the need for a general methodology to construct accurate representations of the actual arrival process of user requests from existing coarse-grained access-log data. Our analysis of the access logs from representative commercial Web sites illustrates self-similar behavior of the arrival process. We propose a drill-down methodology for constructing the arrival process at finer time scales based on the self-similar properties of the arrival process observed at coarse logging time scales. The advantage of our approach is that it maintains consistency between the properties of the arrival processes at both coarser and finer time scales. In addition, our analysis of the request size distribution from commercial Web sites demonstrates a subexponential, but not heavy-tail (power-law) distribution. Through simulations, we investigate the impact of these different traffic models on user response times.