Elements of information theory
Elements of information theory
A structural EM algorithm for phylogenetic inference
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains
ISMB '98 Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology
The Bayesian structural EM algorithm
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data
WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Clustering of diverse genomic data using information fusion
Proceedings of the 2004 ACM symposium on Applied computing
Utilizing hierarchical feature domain values for prediction
Data & Knowledge Engineering
Clustering gene expression data via mining ensembles of classification rules evolved using moses
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Exploiting hierarchical domain values for Bayesian learning
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hi-index | 0.00 |
Biological data, such as gene expression profiles or protein sequences, is often organized in a hierarchy of classes, where the instances assigned to "nearby" classes in the tree are similar. Most approaches for constructing a hierarchy use simple local operations, that are very sensitive to noise or variation in the data. In this paper, we describe probabilistic abstraction hierarchies (PAH) [11], a general probabilistic framework for clustering data into a hierarchy, and show how it can be applied to a wide variety of biological data sets. In a PAH, each class is associated with a probabilistic generative model for the data in the class. The PAH clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the PAH approach is that it utilizes global optimization algorithms for the last two steps, substantially reducing the sensitivity to noise and the propensity to local maxima. We show how to apply this framework to gene expression data, protein sequence data, and HIV protease sequence data. We also show how our framework supports hierarchies involving more than one type of data. We demonstrate that our method extracts useful biological knowledge and is substantially more robust than hierarchical agglomerative clustering.