Sliding window technique for the web log analysis

  • Authors:
  • Nikolai Buzikashvili

  • Affiliations:
  • Russian Academy of Science, Moscow, Russian Fed.

  • Venue:
  • Proceedings of the 16th international conference on World Wide Web
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

The results of the Web query log analysis may be significantly shifted depending on the fraction of agents (non-human clients), which are not excluded from the log. To detect and exclude agents the Web log studies use threshold values for a number of requests submitted by a client during the observation period. However, different studies use different observation periods, and a threshold assigned to one period is usually incomparable with the threshold assigned to the other period. We propose the uniform method equally working on the different observation periods. The method bases on the sliding window technique: a threshold is assigned to the sliding window rather than to the whole observation period. Besides, we determine the sub-optimal values of the parameters of the method: a window size and a threshold and recommend 5-7 unique queries as an upper bound of the threshold for 1-hour sliding window.