Information Retrieval
Probabilistic hierarchical clustering for biological data
Proceedings of the sixth annual international conference on Computational biology
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Data Analysis and Visualization in Genomics and Proteomics
Data Analysis and Visualization in Genomics and Proteomics
International Journal of Software Science and Computational Intelligence
Hi-index | 0.00 |
A novel approach, model-based clustering, is described foridentifying complex interactions between genes or gene-categories based on static gene expression data. The approach deals with categorical data, which consists of a set of gene expressionprofiles belonging to one category, and a set belonging to anothercategory. An evolutionary algorithm (Meta-Optimizing Semantic Evolutionary Search, or MOSES) is used to learn an ensemble of classification models distinguishing the two categories, based on inputs that are features corresponding to gene expression values. Each feature is associated with a model-based vector, which encodes quantitative information regarding the utilization of the feature across the ensembles of models. Two different ways of constructing these vectors are explored. These model-based vectors are then clustered using a variant of hierarchical clustering called Omniclust. The result is a set of model-based clusters, in which features are gathered together if they are often considered together by classification models -- which may be because they're co-expressed, or may be for subtler reasons involving multi-gene interactions. The method is illustrated by applying it to two datasets regarding human gene expression, one drawn from brain cells and pertinent to the neurogenetics of aging, and the other drawn from blood cells and relating to differentiating between types of lymphoma. We find that, compared to traditional expression-based clustering, the new method often yields clusters that have higher mathematical quality (in the sense of homogeneity and separation) and also yield novel and meaningful insights into the underlying biological processes.