Missing value imputation framework for microarray significant gene selection and class prediction

Authors:
Muhammad Shoaib B. Sehgal;Iqbal Gondal;Laurence Dooley
Affiliations:
Faculty of IT, Monash University, Churchill, VIC, Australia;Faculty of IT, Monash University, Churchill, VIC, Australia;Faculty of IT, Monash University, Churchill, VIC, Australia
Venue:
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Year:
2006

Citing 6
Cited 1

Support Vector Machine and Generalized Regression Neural Network Based Classification Fusion Models for Cancer Diagnosis

HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
Classification using partial least squares with penalized logistic regression

Bioinformatics
Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data

Bioinformatics
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

Bioinformatics
Missing values imputation for cDNA microarray data using ranked covariance vectors

International Journal of Hybrid Intelligent Systems - Recent developments in Hybrid Intelligent Systems
Collateral missing value estimation: robust missing value estimation for consequent microarray data processing

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence

Heuristic Non Parametric Collateral Missing Value Imputation: A Step Towards Robust Post-genomic Knowledge Discovery

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microarray data is used in a large number of applications ranging from diagnosis through to drug discovery. Such data however, often contains multiple missing genetic expressions which are generally ignored thus degrading the reliability of inferred results. This paper presents an innovative and robust imputation framework that more accurately estimates missing values leading subsequently to better gene selection and class prediction. To prove this premise, several missing value techniques including the Collateral Missing Values Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), k-Nearest Neighbour (KNN) and ZeroImpute are analysed. A combination of univariate and multiple gene selection methods, namely, Between Group to within Group Sum of Squares and Weighted Partial Least Squares is then performed before applying class prediction using the Ridge Partial Least Square method. Overall, CMVE imputation consistently provided superior missing values estimation accuracy compared with the other algorithms examined, by virtue of exploiting local and global as well as positive and negative correlations between genes, with all empirical results being corroborated by the two-sided Wilcoxon Rank sum statistical significance test.