Effective database transformation and efficient support computation for mining sequential patterns

Authors:
Chung-Wen Cho;Yi-Hung Wu;Arbee L. P. Chen
Affiliations:
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;Department of Computer Science, National Chengchi University, Taipei, Taiwan
Venue:
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Year:
2005

Citing 8
Cited 2

SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Top Down FP-Growth for Association Rule Mining

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting

ICDE '04 Proceedings of the 20th International Conference on Data Engineering

Effective database transformation and efficient support computation for mining sequential patterns

Journal of Intelligent Information Systems
Mining sequential patterns across multiple sequence databases

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a novel algorithm for mining sequential patterns from transaction databases. Since the FP-tree based approach is efficient in mining frequent itemsets, we adapt it to find frequent 1-sequences. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one in the symbolic form. We observe that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smallest size. To discover the frequent k-sequences, we design a tree structure to store the candidates. Each customer sequence is then scanned to decide whether the candidates are frequent k-sequences. We propose a technique to avoid redundantly enumerating the identical k-subsequences from a customer sequence to speed up the process. Moreover, the tree structure is designed in a way such that the supports of the candidates can be incremented for a customer sequence by a single sequential traversal of the tree. The experiment results show that our approach outperforms the previous works in various aspects including the scalability and the execution time.