Sequence mining in categorical domains: incorporating constraints
Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
A Data Mining Algorithm for Generalized Web Prefetching
IEEE Transactions on Knowledge and Data Engineering
Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection
IEEE Transactions on Knowledge and Data Engineering
Mining longest repeating subsequences to predict world wide web surfing
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Generalization of pattern-growth methods for sequential pattern mining with gap constraints
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Recursive data mining for role identification
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovering injective episodes with general partial orders
Data Mining and Knowledge Discovery
Editorial: Pattern-growth based frequent serial episode discovery
Data & Knowledge Engineering
Hi-index | 0.00 |
We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE's classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.