Efficient sequential pattern mining algorithms

  • Authors:
  • Renata Ivancsy;Istvan Vajk

  • Affiliations:
  • Department of Automation and Applied Informatics and HAS-BUTE Control Research Group, Budapest University of Technology and Economics, Budapest, Hungary;Department of Automation and Applied Informatics and HAS-BUTE Control Research Group, Budapest University of Technology and Economics, Budapest, Hungary

  • Venue:
  • AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequential pattern mining is a heavily researched area in the field of data mining with wide variety of applications. The task of discovering frequent sequences is challenging, because the algorithm needs to process a combinatorially explosive number of possible sequences. Most of the methods dealing with the sequential pattern mining problem are based on the approach of the traditional task of itemset mining, because the former can be interpreted as the generalization of the latter. Several algorithms use a level-wise "candidate generate and test" approach, while others use projected databases to discover the frequent sequences. In this paper a classification of the well-known sequence mining algorithm is presented. Because each algorithm has its own advantages and drawbacks regarding the execution time and the memory requirements, and the exact aim of the algorithms differs as well, thus an exact ranking of the methods is omitted. A basic level-wise algorithm, the GSP is described in detail. Because the level-wise algorithms need less memory in general than the projection-based ones, an efficient implementation of the GSP algorithm is also suggested. Two novel methods, the Bitmap-based GSP (BGSP) and the SM-Tree (State Machine-Tree) algorithms are presented as an enhancement of the GSP-based sequential pattern mining approach.