Mining history of changes to web access patterns

  • Authors:
  • Qiankun Zhao;Sourav S. Bhowmick

  • Affiliations:
  • Nanyang Technological University, 639798, Singapore;Nanyang Technological University, 639798, Singapore

  • Venue:
  • PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, a lot of work has been done in web usage mining [2]. Among them, mining of frequent Web Access Pattern (WAP) is the most well researched issue [1]. The idea is to transform web logs into sequences of events with user identifications and timestamps, and then extract association and sequential patterns from the events data with certain metrics. The frequent WAPs have been applied to a wide range of applications such as personalization, system improvement, site modification, business intelligence, and usage characterization [2]. However, most of the existing techniques focus only on mining frequent WAP from snapshot web usage data, while web usage data is dynamic in real life. While the frequent WAPs are useful in many applications, knowledge hidden behind the historical changes of web usage data, which reflects how WAPs change, is also critical to many applications such as adaptive web, web site maintenance, business intelligence, etc.In this paper, we propose a novel approach to discover hidden knowledge from historical changes to WAPs. Rather than focusing on the occurrence of the WAPs, we focus on the frequently changing web access patterns. We define a novel type of knowledge, Frequent Mutating WAP (FM-WAP), based on the historical changes of WAPs. The FM-WAP mining process consists of three phases. Firstly, web usage data is represented as a set of WAP trees and partitioned into a sequence of WAP groups ( subsets of the WAP trees) according to a user-defined calendar pattern, where each WAP group is represented as a WAP forest. Consequently, the log data is represented by a sequence of WAP forests called WAP history. Then, changes among the WAP history are detected and stored in the global forest. Finally, the FM-WAP is extracted by a traversal of the global forest. Extensive experiments show that our proposed approach can produce novel knowledge of web access patterns efficiently with good scalability.