A parameterizable enumeration algorithm for sequence mining

  • Authors:
  • J. David;L. Nourine

  • Affiliations:
  • -;-

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2013

Quantified Score

Hi-index 5.23

Visualization

Abstract

In this paper, we introduce an generic framework for the mining of sequences under various constraints. More precisely, we study the enumeration of all partitions of a word w into multisets of subsequences. We show that using additional predicates, this generator can be used for frequent subsequences and substrings mining. We define the transition graph T"w whose vertices are multisets of words and arcs are transitions between multisets. We show that T"w is a directed acyclic graph and it admits a covering tree. We use T"w to propose a generic algorithm that enumerates all multisets that satisfies a set of predicates, without redundancy.