Evaluating the markov assumption for web usage mining

Authors:
Søren Jespersen;Torben Bach Pedersen;Jesper Thorhauge
Affiliations:
Linkage Software;Aalborg University;Conzentrate
Venue:
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Year:
2003

Citing 8
Cited 7

Analyzing clickstreams using subsessions

Proceedings of the 3rd ACM international workshop on Data warehousing and OLAP
A fine grained heuristic to capture web navigation patterns

ACM SIGKDD Explorations Newsletter
Introduction to Algorithms

Introduction to Algorithms
Statistical Language Learning

Statistical Language Learning
A Popularity-Based Prediction Model for Web Prefetching

Computer
A Hybrid Approach to Web Usage Mining

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Web Mining: Information and Pattern Discovery on the World Wide Web

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence

FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs

Proceedings of the 6th annual ACM international workshop on Web information and data management
Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions

IEEE Transactions on Knowledge and Data Engineering
A framework of combining Markov model with association rules for predicting web page accesses

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Recsplorer: recommendation algorithms based on precedence mining

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An integrated model for next page access prediction

International Journal of Knowledge and Web Intelligence
A novel prediction model based on hierarchical characteristic of web site

Expert Systems with Applications: An International Journal
Improved usage model for web application reliability testing

ICTSS'11 Proceedings of the 23rd IFIP WG 6.1 international conference on Testing software and systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web usage mining concerns the discovery of common browsing patterns, i.e., pages requested in sequence, from web logs. To cope with the enormous amounts of data, several aggregated structures based on statistical models of web surfing have appeared, e.g., the Hypertext Probabilistic Gramma(HPG) model [2]. These techniques typically rely on the Markov assumption with history depth n, i.e., it is assumed that the next requested page is only dependent on the last n pages visited. This is not always valid, i.e. false browsing patterns may be discovered. However, to our knowledge there has been no systematic study of the validity of the Markov assumption wrt. web usage mining and the resulting quality of the mined browsing patterns.In this paper we systematically investigate the quality of browsing patterns mined from structures based on the Markov assumption. Formal measures of quality, based on the closeness of the mined patterns to the true traversal patterns, are defined and an extensive experimental evaluation is performed, based on two substantial real-world data sets. The results indicate that a large number of rules must be considered to achieve high quality, that long rules are generally more distorted than shorter rules and that the model yield knowledge of a higher quality when applied to more random usage patterns. Thus we conclude that Markov-based structures for web usage mining are best suited for tasks demanding less accuracy such as pre-fetching, personalization, and targeted ads.