Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization

  • Authors:
  • Bamshad Mobasher;Honghua Dai;Tao Luo;Miki Nakagawa

  • Affiliations:
  • School of Computer Science, Telecommunication, and Information Systems, DePaul University, Chicago, Illinois, USA. mobasher@cti.depaul.edu;School of Computer Science, Telecommunication, and Information Systems, DePaul University, Chicago, Illinois, USA;School of Computer Science, Telecommunication, and Information Systems, DePaul University, Chicago, Illinois, USA;School of Computer Science, Telecommunication, and Information Systems, DePaul University, Chicago, Illinois, USA

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

Web usage mining, possibly used in conjunction with standard approaches to personalization such as collaborative filtering, can help address some of the shortcomings of these techniques, including reliance on subjective user ratings, lack of scalability, and poor performance in the face of high-dimensional and sparse data. However, the discovery of patterns from usage data by itself is not sufficient for performing the personalization tasks. The critical step is the effective derivation of good quality and useful (i.e., actionable) “aggregate usage profiles” from these patterns. In this paper we present and experimentally evaluate two techniques, based on clustering of user transactions and clustering of pageviews, in order to discover overlapping aggregate profiles that can be effectively used by recommender systems for real-time Web personalization. We evaluate these techniques both in terms of the quality of the individual profiles generated, as well as in the context of providing recommendations as an integrated part of a personalization engine. In particular, our results indicate that using the generated aggregate profiles, we can achieve effective personalization at early stages of users' visits to a site, based only on anonymous clickstream data and without the benefit of explicit input by these users or deeper knowledge about them.