Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

Authors:
Muhammad Shoaib B. Sehgal;Iqbal Gondal;Laurence S. Dooley
Affiliations:
Gippsland School of Computing and Information Technology, Monash University VIC 3842, Australia;Gippsland School of Computing and Information Technology, Monash University VIC 3842, Australia;Gippsland School of Computing and Information Technology, Monash University VIC 3842, Australia
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 17

Sequential imputation for missing values

Computational Biology and Chemistry
A comparison of imputation methods in the presence of imprecise data when employing a neural network s-Sigmoid function

International Journal of Business Intelligence and Data Mining
Ameliorative missing value imputation for robust biological knowledge inference

Journal of Biomedical Informatics
Sequential local least squares imputation estimating missing value of microarray data

Computers in Biology and Medicine
Fuzzy rule induction and artificial immune systems in female breast cancer familiarity profiling

International Journal of Hybrid Intelligent Systems - Recent Advances in Intelligent Paradigms Fusion and Their Applications
Heuristic Non Parametric Collateral Missing Value Imputation: A Step Towards Robust Post-genomic Knowledge Discovery

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Research Article: Robust data imputation

Computational Biology and Chemistry
K nearest neighbours with mutual information for simultaneous classification and missing data imputation

Neurocomputing
Classification of tasks using machine learning

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
How to improve postgenomic knowledge discovery using imputation

EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
The theoretic framework of local weighted approximation for microarray missing value estimation

Pattern Recognition
A weighted Local Least Squares Imputation method for missing value estimation in microarray gene expression data

International Journal of Data Mining and Bioinformatics
Predicting incomplete gene microarray data with the use of supervised learning algorithms

Pattern Recognition Letters
Missing value imputation framework for microarray significant gene selection and class prediction

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Collateral missing value estimation: robust missing value estimation for consequent microarray data processing

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Missing value imputation in DNA microarrays based on conjugate gradient method

Computers in Biology and Medicine
WIMP: Web server tool for missing data imputation

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. Results: The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. Availability: The CMVE software is available upon request from the authors. Contact: Shoaib.Sehgal@infotech.monash.edu.au