A multi-stage approach to clustering and imputation of gene expression profiles

Authors:
Dorothy S. V. Wong;Frederick K. Wong;Graham R. Wood
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 5

Assessing agreement of clustering methods with gene expression microarray data

Computational Statistics & Data Analysis
Bayesian Inference on Hidden Knowledge in High-Throughput Molecular Biology Data

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
How to improve postgenomic knowledge discovery using imputation

EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
Autoregressive-model-based missing value estimation for DNA microarray time series data

IEEE Transactions on Information Technology in Biomedicine
Comparing fuzzy, probabilistic, and possibilistic partitions

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Microarray experiments have revolutionized the study of gene expression with their ability to generate large amounts of data. This article describes an alternative to existing approaches to clustering of gene expression profiles; the key idea is to cluster in stages using a hierarchy of distance measures. This method is motivated by the way in which the human mind sorts and so groups many items. The distance measures arise from the orthogonal breakup of Euclidean distance, giving us a set of independent measures of different attributes of the gene expression profile. Interpretation of these distances is closely related to the statistical design of the microarray experiment. This clustering method not only accommodates missing data but also leads to an associated imputation method. Results: The performance of the clustering and imputation methods was tested on a simulated dataset, a yeast cell cycle dataset and a central nervous system development dataset. Based on the Rand and adjusted Rand indices, the clustering method is more consistent with the biological classification of the data than commonly used clustering methods. The imputation method, at varying levels of missingness, outperforms most imputation methods, based on root mean squared error (RMSE). Availability: Code in R is available on request from the authors. Contact: dwong@efs.mq.edu.au