Pattern classification in DNA microarray data of multiple tumor types
Pattern Recognition
Impact of imputation of missing values on classification error for discrete data
Pattern Recognition
Autoregressive-model-based missing value estimation for DNA microarray time series data
IEEE Transactions on Information Technology in Biomedicine
Protein sequence-based risk classification for human papillomaviruses
Computers in Biology and Medicine
Hybrid method for the analysis of time series gene expression data
Knowledge-Based Systems
Hi-index | 0.01 |
Microarray data are used in many biomedical experiments. They often contain missing values which significantly affect statistical algorithms. Although a number of imputation algorithms have been proposed, they have various limitations to exploit local and global information effectively for estimation. It is necessary to develop more effective techniques to solve the data imputation problem. In this paper, we propose a theoretic framework of local weighted approximation for missing value estimation, based on the Taylor series approximation. Besides revealing that k-nearest neighbor imputation (KNNimpute) is a special case of the framework, we focus on the study of its linear case-local weighted linear approximation imputation (LWLAimpute) from theory to experiment. Experimental results show that LWLAimpute and its iterative version can achieve better performance than some existing imputation methods, the superiority becomes more significant with increasing level of missing values.