Effective database transformation and efficient support computation for mining sequential patterns

Authors:
Chung-Wen Cho;Yi-Hung Wu;Arbee L. Chen
Affiliations:
Department of Computer Science, National Tsing Hua University, Hsinchu, Republic of China;Department of Information and Computer Engineering, Chung Yuan Christian University, Jhongli, Republic of China;Department of Computer Science, National Chengchi University, Tapei, Republic of China
Venue:
Journal of Intelligent Information Systems
Year:
2009

Citing 11
Cited 2

SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Top Down FP-Growth for Association Rule Mining

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TSP: Mining Top-K Closed Sequential Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Introducing Uncertainty into Pattern Discovery in Temporal Event Sequences

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Effective database transformation and efficient support computation for mining sequential patterns

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications

Measuring media-based social interactions in online civicmobilization against corruption in Brazil

Proceedings of the 18th Brazilian symposium on Multimedia and the web
Media-based social interaction patterns: a case study in an online civic mobilization

Proceedings of the 2012 international workshop on Socially-aware multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k驴驴驴2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k驴驴驴2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.