Efficient mining of sequential patterns with time constraints: Reducing the combinations

  • Authors:
  • F. Masseglia;P. Poncelet;M. Teisseire

  • Affiliations:
  • INRIA, 2004 Route des Lucioles - BP 93, 06902 Sophia Antipolis, France;EMA-LGI2P/Site EERIE, Parc Scientifique Georges, Besse, 30035 Nímes Cedex 1, France;LIRMM UMR CNRS 5506, 161 Rue Ada, 34392 Montpellier Cedex 5, France

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.06

Visualization

Abstract

In this paper we consider the problem of discovering sequential patterns by handling time constraints as defined in the Gsp algorithm. While sequential patterns could be seen as temporal relationships between facts embedded in the database where considered facts are merely characteristics of individuals or observations of individual behavior, generalized sequential patterns aim to provide the end user with a more flexible handling of the transactions embedded in the database. We thus propose a new efficient algorithm, called Gtc (Graph for Time Constraints) for mining such patterns in very large databases. It is based on the idea that handling time constraints in the earlier stage of the data mining process can be highly beneficial. One of the most significant new feature of our approach is that handling of time constraints can be easily taken into account in traditional levelwise approaches since it is carried out prior to and separately from the counting step of a data sequence. Our test shows that the proposed algorithm performs significantly faster than a state-of-the-art sequence mining algorithm.