An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting

  • Authors:
  • Ding-Ying Chiu;Yi-Hung Wu;Arbee L. P. Chen

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining sequential patterns in large databases is animportant research topic. The main challenge of miningsequential patterns is the high processing cost due to thelarge amount of data. In this paper, we propose a newstrategy called DIrect Sequence Comparison (abbreviatedas DISC), which can find frequent sequences without havingto compute the support counts of non-frequent sequences.The main difference between the DISC strategy and theprevious works is the way to prune non-frequent sequences.The previous works are based on the anti-monotoneproperty, which prune the non-frequent sequencesaccording to the frequent sequences with shorter lengths.On the contrary, the DISC strategy prunes the non-frequentsequences according to the other sequences with the samelength. Moreover, we summarize three strategies used in theprevious works and design an efficient algorithm calledDISC-all to take advantages of all the four strategies. Theexperimental results show that the DISC-all algorithmoutperforms the PrefixSpan algorithm on mining frequentsequences in large databases. In addition, we analyze thesestrategies to design the dynamic version of our algorithm,which achieves a much better performance.