Clustering gene expression data via mining ensembles of classification rules evolved using moses

Authors:
Moshe Looks;Ben Goertzel;Lucio de Souza Coelho;Mauricio Mudado;Cassio Pennachin
Affiliations:
Washington University in St. Louis: also SAIC, St. Louis, MO;Biomind LLC, Rockville, MD;Biomind LLC, Rockville, MD;Biomind LLC, Rockville, MD;Biomind LLC, Rockville, MD
Venue:
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Year:
2007

Citing 9
Cited 1

Information Retrieval

Information Retrieval
Probabilistic hierarchical clustering for biological data

Proceedings of the sixth annual international conference on Computational biology
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
New methods for joint analysis of biological networks and expression data

Bioinformatics
GO: :TermFinder---open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes

Bioinformatics
Inferring quantitative models of regulatory networks from expression data

Bioinformatics
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

Bioinformatics
Data Analysis and Visualization in Genomics and Proteomics

Data Analysis and Visualization in Genomics and Proteomics

Classifier Ensemble Based Analysis of a Genome-Wide SNP Dataset Concerning Late-Onset Alzheimer Disease

International Journal of Software Science and Computational Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel approach, model-based clustering, is described foridentifying complex interactions between genes or gene-categories based on static gene expression data. The approach deals with categorical data, which consists of a set of gene expressionprofiles belonging to one category, and a set belonging to anothercategory. An evolutionary algorithm (Meta-Optimizing Semantic Evolutionary Search, or MOSES) is used to learn an ensemble of classification models distinguishing the two categories, based on inputs that are features corresponding to gene expression values. Each feature is associated with a model-based vector, which encodes quantitative information regarding the utilization of the feature across the ensembles of models. Two different ways of constructing these vectors are explored. These model-based vectors are then clustered using a variant of hierarchical clustering called Omniclust. The result is a set of model-based clusters, in which features are gathered together if they are often considered together by classification models -- which may be because they're co-expressed, or may be for subtler reasons involving multi-gene interactions. The method is illustrated by applying it to two datasets regarding human gene expression, one drawn from brain cells and pertinent to the neurogenetics of aging, and the other drawn from blood cells and relating to differentiating between types of lymphoma. We find that, compared to traditional expression-based clustering, the new method often yields clusters that have higher mathematical quality (in the sense of homogeneity and separation) and also yield novel and meaningful insights into the underlying biological processes.