Brief communication: Computation of mutual information from Hidden Markov Models

Authors:
Daniel Reker;Stefan Katzenbeisser;Kay Hamacher
Affiliations:
Theoretical Biology and Bioinformatics, Institute of Microbiology and Genetics, Department of Biology, TU Darmstadt, Schnittspahnstr. 10, 64287 Darmstadt, Germany and Security Engineering Group, D ...;Security Engineering Group, Department of Computer Science, Technische Universitãt Darmstadt, 64287 Darmstadt, Germany;Theoretical Biology and Bioinformatics, Institute of Microbiology and Genetics, Department of Biology, TU Darmstadt, Schnittspahnstr. 10, 64287 Darmstadt, Germany
Venue:
Computational Biology and Chemistry
Year:
2010

Citing 4
Cited 0

Python reference manual

Python reference manual
Trainable speech synthesis with trended hidden Markov models

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
A Hidden Semi-Markov Model-Based Speech Synthesis System

IEICE - Transactions on Information and Systems
Research article: Estimating sufficient statistics in co-evolutionary analysis by mutual information

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding evolution at the sequence level is one of the major research visions of bioinformatics. To this end, several abstract models - such as Hidden Markov Models - and several quantitative measures - such as the mutual information - have been introduced, thoroughly investigated, and applied to several concrete studies in molecular biology. With this contribution we want to undertake a first step to merge these approaches (models and measures) for easy and immediate computation, e.g. for a database of a large number of externally fitted models (such as PFAM). Being able to compute such measures is of paramount importance in data mining, model development, and model comparison. Here we describe how one can efficiently compute the mutual information of a homogenous Hidden Markov Model orders of magnitude faster than with a naive, straight-forward approach. In addition, our algorithm avoids sampling issues of real-world sequences, thus allowing for direct comparison of various models. We applied the method to genomic sequences and discuss properties as well as convergence issues.