VOGUE: a novel variable order-gap state machine for modeling sequences

  • Authors:
  • Bouchra Bouqata;Christopher D. Carothers;Boleslaw K. Szymanski;Mohammed J. Zaki

  • Affiliations:
  • CS Department, Rensselaer Polytechnic Institute, Troy, NY;CS Department, Rensselaer Polytechnic Institute, Troy, NY;CS Department, Rensselaer Polytechnic Institute, Troy, NY;CS Department, Rensselaer Polytechnic Institute, Troy, NY

  • Venue:
  • PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE's classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.