Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

  • Authors:
  • Yongxin Tong;Li Zhao;Dan Yu;Shilong Ma;Zhiyuan Cheng;Ke Xu

  • Affiliations:
  • State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191;State Key Lab. of Software Development Environment, Beihang University, Beijing 100191

  • Venue:
  • ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, a novel sequential pattern mining research, called mining repetitive gapped subsequences, has attracted the attention of many researchers. However, the number of repetitive gapped subsequences generated by even these closed mining algorithms may be too large to understand for users, especially when support threshold is low. In this paper, we propose the problem of how to compress repetitive gapped sequential patterns. A novel distance measure of repetitive gapped sequential patterns and an efficient representative pattern checking scheme, *** -dominate sequential pattern checking are proposed. We also develop an efficient algorithm, CRGSgrow ( C ompressing R epetitive G apped S equential pattern grow ), including an efficient pruning strategy, SyncScan. An empirical study with both real and synthetic data sets clearly shows that the CRGSgrow has good performance.