A Cube Model and Cluster Analysis for Web Access Sessions

Authors:
Joshua Zhexue Huang;Michael K. Ng;Wai-Ki Ching;Joe Ng;David Wai-Lok Cheung
Affiliations:
-;-;-;-;-
Venue:
WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
Year:
2001

Citing 16
Cited 11

Algorithms for clustering data

Algorithms for clustering data
The World-Wide Web: quagmire or gold mine?

Communications of the ACM
A general probabilistic framework for clustering individuals and objects

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
The data webhouse toolkit: building the web-enabled data warehouse

The data webhouse toolkit: building the web-enabled data warehouse
Interactive path analysis of web site traffic

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Web Server Book; Tools and Techniques for Building Your Own Internet Information Site

Web Server Book; Tools and Techniques for Building Your Own Internet Information Site
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Efficient Data Mining for Path Traversal Patterns

IEEE Transactions on Knowledge and Data Engineering
Knowledge Discovery in Databases: An Attribute-Oriented Approach

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs

ADL '98 Proceedings of the Advances in Digital Libraries Conference
Knowledge discovery from users Web-page navigation

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems

Separating the swarm: categorization methods for user sessions on the web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Web mining for web personalization

ACM Transactions on Internet Technology (TOIT)
Mining interesting knowledge from weblogs: a survey

Data & Knowledge Engineering
An e-customer behavior model with online analytical mining for internet marketing planning

Decision Support Systems
Mining web browsing patterns for E-commerce

Computers in Industry
Validation and interpretation of Web users' sessions clusters

Information Processing and Management: an International Journal
Computational Intelligence techniques for Web personalization

Web Intelligence and Agent Systems
Recommender systems: incremental clustering on web log data

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Semantic Web Mining

Web Semantics: Science, Services and Agents on the World Wide Web
Mining temporally changing web usage graphs

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Adaptive web usage profiling

WebKDD'05 Proceedings of the 7th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of the navigational patterns of casual visitors is an important step in online recommendation to convert casual visitors to customers in e-commerce. Clustering and sequential analysis are two primary techniques for mining navigational patterns from Web and application server logs. The characteristics of the log data and mining tasks require new data representation methods and analysis algorithms to be tested in the e-commerce environment. In this paper we present a cube model to represent Web access sessions for data mining. The cube model organizes session data into three dimensions. The COMPONENT dimension represents a session as a set of ordered components {c1, c2, ..., cP}, in which each component ci indexes the ith visited page in the session. Each component is associated with a set of attributes describing the page indexed by it, such as the page ID, category and view time spent at the page. The attributes associated with each component are defined in the ATTRIBUTE dimension. The SESSION dimension indexes individual sessions. In the model, irregular sessions are converted to a regular data structure to which existing data mining algorithms can be applied while the order of the page sequences is maintained. A rich set of page attributes is embedded in the model for different analysis purposes. We also present some experimental results of using the partitional clustering algorithm to cluster sessions. Because the sessions are essentially sequences of categories, the k-modes algorithm designed for clustering categorical data and the clustering method using the Markov transition frequency (or probability) matrix, are used to cluster categorical sequences.