Web path recommendations based on page ranking and Markov models
Proceedings of the 7th annual ACM international workshop on Web information and data management
Usage-Based PageRank for Web Personalization
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
XML structural delta mining: issues and challenges
Data & Knowledge Engineering - Special issue: ER 2003
Web site personalization based on link analysis and navigational patterns
ACM Transactions on Internet Technology (TOIT)
X-Tracking the Changes of Web Navigation Patterns
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
Recently, a lot of work has been done in web usage mining [2]. Among them, mining of frequent Web Access Pattern (WAP) is the most well researched issue [1]. The idea is to transform web logs into sequences of events with user identifications and timestamps, and then extract association and sequential patterns from the events data with certain metrics. The frequent WAPs have been applied to a wide range of applications such as personalization, system improvement, site modification, business intelligence, and usage characterization [2]. However, most of the existing techniques focus only on mining frequent WAP from snapshot web usage data, while web usage data is dynamic in real life. While the frequent WAPs are useful in many applications, knowledge hidden behind the historical changes of web usage data, which reflects how WAPs change, is also critical to many applications such as adaptive web, web site maintenance, business intelligence, etc.In this paper, we propose a novel approach to discover hidden knowledge from historical changes to WAPs. Rather than focusing on the occurrence of the WAPs, we focus on the frequently changing web access patterns. We define a novel type of knowledge, Frequent Mutating WAP (FM-WAP), based on the historical changes of WAPs. The FM-WAP mining process consists of three phases. Firstly, web usage data is represented as a set of WAP trees and partitioned into a sequence of WAP groups ( subsets of the WAP trees) according to a user-defined calendar pattern, where each WAP group is represented as a WAP forest. Consequently, the log data is represented by a sequence of WAP forests called WAP history. Then, changes among the WAP history are detected and stored in the global forest. Finally, the FM-WAP is extracted by a traversal of the global forest. Extensive experiments show that our proposed approach can produce novel knowledge of web access patterns efficiently with good scalability.