DSM-PLW: single-pass mining of path traversal patterns over streaming web click-sequences

Authors:
Hua-Fu Li;Suh-Yin Lee;Man-Kwan Shan
Affiliations:
Department of Computer Science and Information Engineering, National Chiao-Tung University, Hsinchu, Taiwan, ROC;Department of Computer Science and Information Engineering, National Chiao-Tung University, Hsinchu, Taiwan, ROC;Department of Computer Science, National Chengchi University, Wenshan, Taipei, Taiwan, ROC
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Year:
2006

Citing 37
Cited 7

The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Using path profiles to predict HTTP requests

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A framework for measuring changes in data characteristics

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Synopsis data structures for massive data sets

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data streams under block evolution

ACM SIGKDD Explorations Newsletter
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Continuous queries over data streams

ACM SIGMOD Record
Efficient Data Mining for Path Traversal Patterns

IEEE Transactions on Knowledge and Data Engineering
A Popularity-Based Prediction Model for Web Prefetching

Computer
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Issues in data stream management

ACM SIGMOD Record
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Distributed top-k monitoring

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Web Mining: Information and Pattern Discovery on the World Wide Web

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On mining webclick streams for path traversal patterns

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Decision trees for web log mining

Intelligent Data Analysis
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Approximate mining of maximal frequent itemsets in data streams with different window models

Expert Systems with Applications: An International Journal
A sliding window method for finding top-k path traversal patterns over streaming Web click-sequences

Expert Systems with Applications: An International Journal
Mining top-k maximal reference sequences from streaming web click-sequences with a damped sliding window

Expert Systems with Applications: An International Journal
Mining Web navigation patterns with a path traversal graph

Expert Systems with Applications: An International Journal
Fuzzy classification in web usage mining using fuzzy quantifiers

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Sliding window based weighted maximal frequent pattern mining over data streams

Expert Systems with Applications: An International Journal
Mining maximal frequent patterns by considering weight conditions over data streams

Knowledge-Based Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbounded length, possibly a very fast arrival rate, inability to backtrack over previously arrived click-sequences, and a lack of system control over the order in which the data arrive. In this paper, we propose a projection-based, single-pass algorithm, called DSM-PLW (Data Stream Mining for Path traversal patterns in a Landmark Window), for online incremental mining of path traversal patterns over a continuous stream of maximal forward references generated at a rapid rate. According to the algorithm, each maximal forward reference of the stream is projected into a set of reference-suffix maximal forward references, and these reference-suffix maximal forward references are inserted into a new in-memory summary data structure, called SP-forest (Summary Path traversal pattern forest), which is an extended prefix tree-based data structure for storing essential information about frequent reference sequences of the stream so far. The set of all maximal reference sequences is determined from the SP-forest by a depth-first-search mechanism, called MRS-mining (Maximal Reference Sequence mining). Theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data.