Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

Authors:
Yongxin Tong;Li Zhao;Dan Yu;Shilong Ma;Zhiyuan Cheng;Ke Xu
Affiliations:
State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 9
Cited 0

KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient mining of iterative patterns for software specification discovery

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A scalable algorithm for mining maximal frequent sequences using a sample

Knowledge and Information Systems
Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, a novel sequential pattern mining research, called mining repetitive gapped subsequences, has attracted the attention of many researchers. However, the number of repetitive gapped subsequences generated by even these closed mining algorithms may be too large to understand for users, especially when support threshold is low. In this paper, we propose the problem of how to compress repetitive gapped sequential patterns. A novel distance measure of repetitive gapped sequential patterns and an efficient representative pattern checking scheme, *** -dominate sequential pattern checking are proposed. We also develop an efficient algorithm, CRGSgrow ( C ompressing R epetitive G apped S equential pattern grow ), including an efficient pruning strategy, SyncScan. An empirical study with both real and synthetic data sets clearly shows that the CRGSgrow has good performance.