Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

Authors:
Ben Kao;Minghua Zhang;Chi-Lap Yip;David W. Cheung;Usama Fayyad
Affiliations:
Department of Computer Science, The University of Hong Kong;Department of Computer Science, The University of Hong Kong;Department of Computer Science, The University of Hong Kong;Department of Computer Science, The University of Hong Kong;Department of Computer Science, The University of Hong Kong
Venue:
Data Mining and Knowledge Discovery
Year:
2005

Citing 16
Cited 5

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Discovering Patterns from Large and Dynamic Sequential Data

Journal of Intelligent Information Systems
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient algorithm to update large itemsets with early pruning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental and interactive sequence mining

Proceedings of the eighth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules

Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient Mining of Association Rules in Large Dynamic Databases

BNCOD 16 Proceedings of the 16th British National Conferenc on Databases: Advances in Databases
Efficient Algorithms for Incremental Update of Frequent Sequences

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A General Incremental Technique for Maintaining Discovered Association Rules

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
An Adaptive Algorithm for Incremental Mining of Association Rules

DEXA '98 Proceedings of the 9th International Workshop on Database and Expert Systems Applications

Fast discovery of sequential patterns in large databases using effective time-indexing

Information Sciences: an International Journal
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Data & Knowledge Engineering
IMCS: incremental mining of closed sequential patterns

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Efficient incremental mining of frequent sequence generators

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
User Behaviour Pattern Mining from Weblog

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study two problems: (1) mining frequent sequences from a transactional database, and (2) incremental update of frequent sequences when the underlying database changes over time. We review existing sequence mining algorithms including GSP, PrefixSpan, SPADE, and ISM. We point out the large memory requirement of Pref ixSpan, SPADE, and ISM, and evaluate the performance of GSP. We discuss the high I/O cost of GSP, particularly when the database contains long frequent sequences. To reduce the I/O requirement, we propose an algorithm MFS, which could be considered as a generalization of GSP. The general strategy of MFS is to first find an approximate solution to the set of frequent sequences and then perform successive refinement until the exact set of frequent sequences is obtained. We show that this successive refinement approach results in a significant improvement in I/O cost. We discuss how MFS can be applied to the incremental update problem. In particular, the result of a previous mining exercise can be used (by MFS) as a good initial approximate solution for the mining of an updated database. This results in an I/O efficient algorithm. To improve processing efficiency, we devise pruning techniques that, when coupled with GSP or MFS, result in algorithms that are both CPU and I/O efficient.