Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Authors:
Lei Chang;Tengjiao Wang;Dongqing Yang;Hua Luan;Shiwei Tang
Affiliations:
School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Ch ...;School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Ch ...;School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Ch ...;School of Information, Renmin University of China, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of Machine Perception (Ministry of Education), Peking University
Venue:
Data & Knowledge Engineering
Year:
2009

Citing 35
Cited 12

Incremental and interactive sequence mining

Proceedings of the eighth international conference on Information and knowledge management
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
webSPADE: A Parallel Sequence Mining Algorithm to Analyze Web Log Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Incremental mining of sequential patterns in large databases

Data & Knowledge Engineering
TSP: Mining Top-K Closed Sequential Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
BIDE: Efficient Mining of Frequent Closed Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Incremental update on sequential patterns in large databases by implicit merging and efficient counting

Information Systems - Databases: Creation, management and utilization
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The complexity of mining maximal frequent itemsets and maximal frequent patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
IncSpan: incremental mining of sequential patterns in large database

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable sequential pattern mining for biological sequences

Proceedings of the thirteenth ACM international conference on Information and knowledge management
FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs

Proceedings of the 6th annual ACM international workshop on Web information and data management
Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree

Data Mining and Knowledge Discovery
Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

Data Mining and Knowledge Discovery
Mining block correlations to improve storage performance

ACM Transactions on Storage (TOS)
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Parallel mining of closed sequential patterns

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Mining compressed sequential patterns

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications

On mining multi-time-interval sequential patterns

Data & Knowledge Engineering
Discovering hybrid temporal patterns from sequences consisting of point- and interval-based events

Data & Knowledge Engineering
Margin-closed frequent sequential pattern mining

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
A sequential pattern mining algorithm using rough set theory

International Journal of Approximate Reasoning
Efficient incremental mining of frequent sequence generators

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Single-pass incremental and interactive mining for weighted frequent patterns

Expert Systems with Applications: An International Journal
BIDE-Based parallel mining of frequent closed sequences with mapreduce

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
MSGPs: a novel algorithm for mining sequential generator patterns

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
A new approach for problem of sequential pattern mining

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
Sliding window based weighted maximal frequent pattern mining over data streams

Expert Systems with Applications: An International Journal
Mining maximal frequent patterns by considering weight conditions over data streams

Knowledge-Based Systems
High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent study shows that mining compact frequent patterns (such as closed patterns and compressed patterns) can alleviate the interpretability and efficiency problem encountered by traditional frequent pattern mining methods. Compact frequent patterns keep exact or approximate supports of a complete set of frequent patterns, and the number of them is often orders of magnitude smaller. Several efficient algorithms have been proposed to mine compact sequential patterns. However, sequence databases are not always static. Sequences (or items) are often added to and deleted from databases. A slight change made on a database may lead to the change of compact patterns. Mining from scratch is very time-consuming and thus infeasible. In this paper, we explore how to efficiently maintain closed sequential patterns in a dynamic sequence database environment. A compact structure CSTree is designed to keep closed sequential patterns, and its nice properties are carefully studied. Two efficient algorithms, IMCS"A and IMCS"D, are developed to maintain the CSTree upon incremental update. The algorithms make full use of the properties of CSTree to find nodes whose states are obsolete and avoid unnecessary node extension and closure checking operations to accelerate the incremental update process. A thorough experimental study on various real and synthetic datasets shows that the proposed algorithms outperform the state-of-the-art algorithms - PrefixSpan, CloSpan, BIDE and a recently proposed incremental mining algorithm IncSpan by about a factor of 4 to more than an order of magnitude.