VOGUE: a novel variable order-gap state machine for modeling sequences

Authors:
Bouchra Bouqata;Christopher D. Carothers;Boleslaw K. Szymanski;Mohammed J. Zaki
Affiliations:
CS Department, Rensselaer Polytechnic Institute, Troy, NY;CS Department, Rensselaer Polytechnic Institute, Troy, NY;CS Department, Rensselaer Polytechnic Institute, Troy, NY;CS Department, Rensselaer Polytechnic Institute, Troy, NY
Venue:
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Year:
2006

Citing 7
Cited 4

Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones

Machine Learning
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
A Data Mining Algorithm for Generalized Web Prefetching

IEEE Transactions on Knowledge and Data Engineering
Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection

IEEE Transactions on Knowledge and Data Engineering
Mining longest repeating subsequences to predict world wide web surfing

USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Generalization of pattern-growth methods for sequential pattern mining with gap constraints

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition

Recursive data mining for role identification

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovering injective episodes with general partial orders

Data Mining and Knowledge Discovery
Editorial: Pattern-growth based frequent serial episode discovery

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE's classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.