A Hierarchical Latent Variable Model for Data Visualization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical
Advances in kernel methods
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Analysis of Gene Expression Data with Pathway Scores
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site
Data Mining and Knowledge Discovery
Mining biologically active patterns in metabolic pathways using microarray expression profiles
ACM SIGKDD Explorations Newsletter
Estimating the Support of a High-Dimensional Distribution
Neural Computation
Hi-index | 0.00 |
With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.