Heuristic Non Parametric Collateral Missing Value Imputation: A Step Towards Robust Post-genomic Knowledge Discovery

  • Authors:
  • Muhammad Shoaib Sehgal;Iqbal Gondal;Laurence S. Dooley;Ross Coppel

  • Affiliations:
  • ARC Centre of Excellence in Bioinformatics at IMB, University of Queensland, St Lucia, Australia QLD 4067;Faculty of Information Technology, Monash University, Churchill, Australia VIC. 3842;Department of Communications and Systems, The Open University, Milton Keynes, United Kingdom MK7 6AA;Department of Microbiology, and Victorian Bioinformatics Consortium, Clayton, Australia VIC. 3800

  • Venue:
  • PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microarrays are able to measure the patterns of expression of thousands of genes in a genome to give profiles that facilitate much faster analysis of biological processes for diagnosis, prognosis and tailored drug discovery. Microarrays, however, commonly have missing values which can result in erroneous downstream analysis. To impute these missing values, various algorithms have been proposed including Collateral Missing Value Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), Local Least Square Impute (LLSImpute) and K-Nearest Neighbour (KNN). Most of these imputation algorithms exploit either the global or local correlation structure of the data, which normally leads to larger estimation errors. This paper presents an enhanced Heuristic Non Parametric Collateral Missing Value Imputation (HCMVI) algorithm which uses CMVE as its core estimator and Heuristic Non Parametric strategy to compute optimal number of estimator genes to exploit optimally both local and global correlations.