Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm

Authors:
Sungjune Park;Nallan C. Suresh;Bong-Keun Jeong
Affiliations:
The University of North Carolina at Charlotte, Business Information Systems and Operations Management, The Belk College of Business, 9201 University City Blvd, Charlotte, NC 28223, United States;Department of Operations Management and Strategy, School of Management, State University of New York, Buffalo, NY 14260, United States;The University of North Carolina at Charlotte, Business Information Systems and Operations Management, The Belk College of Business, 9201 University City Blvd, Charlotte, NC 28223, United States
Venue:
Data & Knowledge Engineering
Year:
2008

Citing 32
Cited 5

Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system

Neural Networks
From user access patterns to dynamic hypertext linking

Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Data clustering: a review

ACM Computing Surveys (CSUR)
Link prediction and path analysis using Markov chains

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Visualization of navigation patterns on a Web site using model-based clustering

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Zipf's law for Web surfers

Knowledge and Information Systems
Measuring similarity of interests for clustering web-users

ADC '01 Proceedings of the 12th Australasian database conference
Efficient Data Mining for Path Traversal Patterns

IEEE Transactions on Knowledge and Data Engineering
Web mining for web personalization

ACM Transactions on Internet Technology (TOIT)
A Scalable Algorithm for Clustering Sequential Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Knowledge Discovery in Databases: An Attribute-Oriented Approach

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using Markov Chains for Link Prediction in Adaptive Web Sites

Soft-Ware 2002 Proceedings of the First International Conference on Computing in an Imperfect World
Web page clustering using a self-organizing map of user navigation patterns

Decision Support Systems - Special issue: Web data mining
Knowledge discovery from users Web-page navigation

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Web Usage Mining as a Tool for Personalization: A Survey

User Modeling and User-Adapted Interaction
Efficient and Anonymous Web-Usage Mining for Web Personalization

INFORMS Journal on Computing
Adaptive Neural Network Clustering of Web Users

Computer
Selective Markov models for predicting Web page accesses

ACM Transactions on Internet Technology (TOIT)
A data cube model for prediction-based web prefetching

Journal of Intelligent Information Systems - Special issue on web intelligence
Full-Coverage Web Prediction based on Web Usage Mining and Site Topology

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Discovering Statistics Using SPSS

Discovering Statistics Using SPSS
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)

Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
Mining interesting knowledge from weblogs: a survey

Data & Knowledge Engineering
Integration of ART2 neural network and genetic K-means algorithm for analyzing web browsing paths in electronic commerce

Decision Support Systems
Mining longest repeating subsequences to predict world wide web surfing

USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Rough clustering of sequential data

Data & Knowledge Engineering
Weighted order-dependent clustering and visualization of web navigation patterns

Decision Support Systems
Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms

Expert Systems with Applications: An International Journal

Discovering better navigation sequences for the session construction problem

Data & Knowledge Engineering
Web usage mining for analysing elder self-care behavior patterns

Expert Systems with Applications: An International Journal
Similarity reasoning for the semantic web based on fuzzy concept lattices: An informal approach

Information Systems Frontiers
Mining cluster-based patterns for elder self-care behavior

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed and its superior performance is demonstrated.