A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis

  • Authors:
  • Myra Spiliopoulou;Bamshad Mobasher;Bettina Berendt;Miki Nakagawa

  • Affiliations:
  • -;-;-;-

  • Venue:
  • INFORMS Journal on Computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Web-usage mining has become the subject of intensive research, as its potential for personalized services, adaptive Web sites and customer profiling is recognized. However, the reliability of Web-usage mining results depends heavily on the proper preparation of the input datasets. In particular, errors in the reconstruction of sessions and incomplete tracing of users' activities in a site can easily result in invalid patterns and wrong conclusions. In this study, we evaluate the performance of heuristics employed to reconstruct sessions from the server log data. Such heuristics are called to partition activities first by user and then by visit of the user in the site, where user identification mechanisms, such as cookies, may or may not be available. We propose a set of performance measures that are sensitive to two types of reconstruction errors and appropriate for different applications in knowledge discovery (KDD) applications.We have tested our framework on the Web server data of a frame-based Web site. The first experiment concerned a specific KDD application and has shown the sensitivity of the heuristics to particularities of the site's structure and traffic. The second experiment is not bound to a specific application but rather compares the performance of the heuristics for different measures and thus for different application types. Our results show that there is no single best heuristic, but our measures help the analyst in the selection of the heuristic best suited for the application at hand.