Algorithms for clustering data
Algorithms for clustering data
Fuzzy sets in pattern recognition: methodology and methods
Pattern Recognition
The nature of statistical learning theory
The nature of statistical learning theory
On the exponential value of labeled samples
Pattern Recognition Letters
A general probabilistic framework for clustering individuals and objects
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A new approach to analyzing gene expression time series data
Proceedings of the sixth annual international conference on Computational biology
Bayesian Clustering by Dynamics
Machine Learning - Special issue: Unsupervised learning
Hidden Markov Model} Induction by Bayesian Model Merging
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Learning from Labeled and Unlabeled Data using Graph Mincuts
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Problems of learning on manifolds
Problems of learning on manifolds
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Open source clustering software
Bioinformatics
Analyzing time series gene expression data
Bioinformatics
Novel Algorithm for Coexpression Detection in Time-Varying Microarray Data Sets
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Finding Significantly Expressed genes from time-course expression profiles
International Journal of Bioinformatics Research and Applications
Modelling Stem Cells Lineages with Markov Trees
PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Significance analysis of time-course gene expression profiles
ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Validating gene clusterings by selecting informative gene ontology terms with mutual information
BSB'07 Proceedings of the 2nd Brazilian conference on Advances in bioinformatics and computational biology
IEEE Transactions on Information Technology in Biomedicine
A semi-supervised fuzzy clustering algorithm applied to gene expression data
Pattern Recognition
Inferring Nonstationary Gene Networks from Longitudinal Gene Expression Microarrays
Journal of Signal Processing Systems
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Computational Statistics & Data Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Measuring gene expression over time can provide important insights into basic cellular processes. Identifying groups of genes with similar expression time-courses is a crucial first step in the analysis. As biologically relevant groups frequently overlap, due to genes having several distinct roles in those cellular processes, this is a difficult problem for classical clustering methods. We use a mixture model to circumvent this principal problem, with hidden Markov models (HMMs) as effective and flexible components. We show that the ensuing estimation problem can be addressed with additional labeled data驴partially supervised learning of mixtures驴through a modification of the Expectation-Maximization (EM) algorithm. Good starting points for the mixture estimation are obtained through a modification to Bayesian model merging, which allows us to learn a collection of initial HMMs. We infer groups from mixtures with a simple information-theoretic decoding heuristic, which quantifies the level of ambiguity in group assignment. The effectiveness is shown with high-quality annotation data. As the HMMs we propose capture asynchronous behavior by design, the groups we find are also asynchronous. Synchronous subgroups are obtained from a novel algorithm based on Viterbi paths. We show the suitability of our HMM mixture approach on biological and simulated data and through the favorable comparison with previous approaches. A software implementing the method is freely available under the GPL from http://ghmm.org/gql.