CAMLS: a constraint-based apriori algorithm for mining long sequences

  • Authors:
  • Yaron Gonen;Nurit Gal-Oz;Ran Yahalom;Ehud Gudes

  • Affiliations:
  • Department of Computer Science, Ben Gurion University of the Negev, Israel;Department of Computer Science, Ben Gurion University of the Negev, Israel;Department of Computer Science, Ben Gurion University of the Negev, Israel;Department of Computer Science, Ben Gurion University of the Negev, Israel

  • Venue:
  • DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.