Web Usage Mining in Noisy and Ambiguous Environments: Exploring the Role of Concept Hierarchies, Compression, and Robust User Profiles

  • Authors:
  • Olfa Nasraoui;Esin Saka

  • Affiliations:
  • Knowledge Discovery & Web Mining Lab, University of Louisville, Louisville, KY 40292, USA;Knowledge Discovery & Web Mining Lab, University of Louisville, Louisville, KY 40292, USA

  • Venue:
  • From Web to Social Web: Discovering and Deploying User and Content Profiles
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent efforts in Web usage mining have started incorporating more semantics into the data in order to obtain a representation deeper than shallow clicks. In this paper, we review these approaches, and examine the incorporation of simple cues from a website hierarchy in order to relate clickstream events that would otherwise seem unrelated, and thus perform URL compression. We study their effect on data reduction and on the quality of the resulting knowledge discovery. Web usage data is also notorious for containing moderate to high amounts of noise, thus motivating the use of robust knowledge discovery algorithms that can resist noise and outliers with various degrees of resistance or robustness. Therefore, we also examine the effect of robustness on the final quality of the knowledge discovery. Our experimental results conclude that post-processed and robust user profiles have better quality than raw profiles that are estimated through optimization alone. However URL compression, as expected, tends to reduce the quality, but also can drastically reduce the size of the data set, resulting in faster mining.