FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs

Authors:
Maged El-Sayed;Carolina Ruiz;Elke A. Rundensteiner
Affiliations:
Worcester Polytechnic Institute;Worcester Polytechnic Institute;Worcester Polytechnic Institute
Venue:
Proceedings of the 6th annual ACM international workshop on Web information and data management
Year:
2004

Citing 13
Cited 23

Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Incremental and interactive sequence mining

Proceedings of the eighth international conference on Information and knowledge management
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining web logs for prediction models in WWW caching and prefetching

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient mining of traversal patterns

Data & Knowledge Engineering - Building web warehouse
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Web Mining: Information and Pattern Discovery on the World Wide Web

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
WhatNext: A Prediction System for Web Requests using N-gram Sequence Models

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Evaluation of web usage mining approaches for user's next request prediction

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Evaluating the markov assumption for web usage mining

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management

Using association rules for fraud detection in web advertising networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Web path recommendations based on page ranking and Markov models

Proceedings of the 7th annual ACM international workshop on Web information and data management
Usage-Based PageRank for Web Personalization

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Frequent pattern discovery in online environment

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Web site personalization based on link analysis and navigational patterns

ACM Transactions on Internet Technology (TOIT)
Fast accumulation lattice algorithm for mining sequential patterns

ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
Discovering information diffusion paths from blogosphere for online advertising

Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Fast mining maximal sequential patterns

SMO'07 Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization
Finding relevant patterns in bursty sequences

Proceedings of the VLDB Endowment
Fast mining of closed sequential patterns

WSEAS Transactions on Computers
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Data & Knowledge Engineering
SRDFA: A Kind of Session Reconstruction DFA

NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
A change detection method for sequential patterns

Decision Support Systems
Identifying web navigation behaviour and patterns automatically from clickstream data

International Journal of Web Engineering and Technology
Recsplorer: recommendation algorithms based on precedence mining

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A taxonomy of sequential pattern mining algorithms

ACM Computing Surveys (CSUR)
Analysis on repeat-buying patterns

Knowledge-Based Systems
Association rule based data mining agents for personalized web caching

COMPSAC-W'05 Proceedings of the 29th annual international conference on Computer software and applications conference
Beyond the usual suspects: context-aware revisitation support

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Mining the change of customer behavior in fuzzy time-interval sequential patterns

Applied Soft Computing
Client- and server-side revisitation prediction with SUPRA

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)
User Behaviour Pattern Mining from Weblog

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent patterns is an important component of many prediction systems. One common usage in web applications is the mining of users' access behavior for the purpose of predicting and hence pre-fetching the web pages that the user is likely to visit. In this paper we introduce an efficient strategy for discovering frequent patterns in sequence databases that requires only two scans of the database. The first scan obtains support counts for subsequences of length two. The second scan extracts potentially frequent sequences of any length and represents them as a compressed frequent sequences tree structure (FS-tree). Frequent sequence patterns are then mined from the FS-tree. Incremental and interactive mining functionalities are also facilitated by the FS-tree. As part of this work, we developed the FS-Miner, a system that discovers frequent sequences from web log files. The FS-Miner has the ability to adapt to changes in users' behavior over time, in the form of new input sequences, and to respond incrementally without the need to perform full re-computation. Our system also allows the user to change the input parameters (e.g., minimum support and desired pattern size) interactively without requiring full re-computation in most cases. We have tested our system comparing it against two other algorithms from the literature. Our experimental results show that our system scales up linearly with the size of the input database. Furthermore, it exhibits excellent adaptability to support threshold decreases. We also show that the incremental update capability of the system provides significant performance advantages over full re-computation even for relatively large update sizes.