Using interesting sequences to interactively build Hidden Markov Models

  • Authors:
  • Szymon Jaroszewicz

  • Affiliations:
  • National Institute of Telecommunications, Warsaw, Poland

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a method of interactive construction of global Hidden Markov Models (HMMs) based on local sequence patterns discovered in data. The method is based on finding interesting sequences whose frequency in the database differs from that predicted by the model. The patterns are then presented to the user who updates the model using their intelligence and their understanding of the modelled domain. It is demonstrated that such an approach leads to more understandable models than automated approaches. Two variants of the problem are considered: mining patterns occurring only at the beginning of sequences and mining patterns occurring at any position; both practically meaningful. For each variant, algorithms have been developed allowing for efficient discovery of all sequences with given minimum interestingness. Applications to modelling webpage visitors behavior and to modelling protein secondary structure are presented, validating the proposed approach.