On extracting session data from activity logs

  • Authors:
  • David Mehrzadi;Dror G. Feitelson

  • Affiliations:
  • The Hebrew University, Jerusalem, Israel;The Hebrew University, Jerusalem, Israel

  • Venue:
  • Proceedings of the 5th Annual International Systems and Storage Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Activity logs from large-scale systems facilitate the study of user behavior, which can be used to improve and tune the user experience. However, the available data often lacks important elements such as the identification of user sessions. Previous work typically compensated for this by setting a threshold of around 30 minutes, and assuming that breaks in activity longer than the threshold reflect breaks between sessions. We show that using such a global threshold introduces artifacts that may affect the analysis, because there is a high probability that long sessions are not identified correctly. As an alternative, we suggest that a suitable individual threshold be found for each user, based on that user's activity pattern. Applying this approach to a large dataset from the AOL search engine leads to a distribution of session durations that is free of artifacts like those that appear when using a global threshold.