A Hierarchical Mixture of Markov Models for Finding Biologically Active Metabolic Paths Using Gene Expression and Protein Classes

Authors:
Hiroshi Mamitsuka;Yasushi Okuno
Affiliations:
Kyoto University;Kyoto University
Venue:
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Year:
2004

Citing 7
Cited 0

A Hierarchical Latent Variable Model for Data Visualization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical

Advances in kernel methods
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Analysis of Gene Expression Data with Pathway Scores

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Data Mining and Knowledge Discovery
Mining biologically active patterns in metabolic pathways using microarray expression profiles

ACM SIGKDD Explorations Newsletter
Estimating the Support of a High-Dimensional Distribution

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.