The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Temporal sequence learning and data reduction for anomaly detection
ACM Transactions on Information and System Security (TISSEC)
The Hierarchical Hidden Markov Model: Analysis and Applications
Machine Learning
Sequence mining in categorical domains: incorporating constraints
Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Mining Sequential Patterns with Regular Expression Constraints
IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
A Data Mining Algorithm for Generalized Web Prefetching
IEEE Transactions on Knowledge and Data Engineering
N-th order Ergodic Multigram HMM for modeling of languages without marked word boundaries
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Mining periodic patterns with gap requirement from sequences
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection
IEEE Transactions on Knowledge and Data Engineering
A generic motif discovery algorithm for sequential data
Bioinformatics
Mining Complex Time-Series Data by Learning Markovian Models
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Mining longest repeating subsequences to predict world wide web surfing
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Hierarchical Hidden Markov Models for User/Process Profile Learning
Fundamenta Informaticae - Special issue ISMIS'05
Recursive data mining for role identification
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Mining complex patterns across sequences with gap requirements
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Generalization of pattern-growth methods for sequential pattern mining with gap constraints
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Sequence Data Mining
VOGUE: a novel variable order-gap state machine for modeling sequences
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Recursive data mining for role identification in electronic communications
International Journal of Hybrid Intelligent Systems
Personalized news recommendation with context trees
Proceedings of the 7th ACM conference on Recommender systems
Frequent patterns mining in multiple biological sequences
Computers in Biology and Medicine
Hi-index | 0.00 |
We present VOGUE, a novel, variable order hidden Markov model with state durations, that combines two separate techniques for modeling complex patterns in sequential data: pattern mining and data modeling. VOGUE relies on a variable gap sequence mining method to extract frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build a variable order hidden Markov model (HMM), that explicitly models the gaps. The gaps implicitly model the order of the HMM, and they explicitly model the duration of each state. We apply VOGUE to a variety of real sequence data taken from domains such as protein sequence classification, Web usage logs, intrusion detection, and spelling correction. We show that VOGUE has superior classification accuracy compared to regular HMMs, higher-order HMMs, and even special purpose HMMs like HMMER, which is a state-of-the-art method for protein classification. The VOGUE implementation and the datasets used in this article are available as open-source.1