A hidden Markov model variant for sequence classification

Authors:
Sam Blasiak;Huzefa Rangwala
Affiliations:
Computer Science, George Mason University;Computer Science, George Mason University
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Year:
2011

Citing 8
Cited 3

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
Probability Product Kernels

The Journal of Machine Learning Research
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Mismatch string kernels for discriminative protein classification

Bioinformatics
Profile-based direct kernels for remote homology detection and fold recognition

Bioinformatics

Mining recent temporal patterns for event detection in multivariate time series data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A family of feed-forward models for protein sequence classification

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A hidden Markov model-based acoustic cicada detector for crowdsourced smartphone biodiversity monitoring

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequence classification is central to many practical problems within machine learning. Distances metrics between arbitrary pairs of sequences can be hard to define because sequences can vary in length and the information contained in the order of sequence elements is lost when standard metrics such as Euclidean distance are applied. We present a scheme that employs a Hidden Markov Model variant to produce a set of fixed-length description vectors from a set of sequences. We then define three inference algorithms, a Baum-Welch variant, a Gibbs Sampling algorithm, and a variational algorithm, to infer model parameters. Finally, we show experimentally that the fixed length representation produced by these inference methods is useful for classifying sequences of amino acids into structural classes.