Gaussian mixture clustering and imputation of microarray data

  • Authors:
  • Ming Ouyang;William J. Welsh;Panos Georgopoulos

  • Affiliations:
  • Environmental and Occupational Health Sciences Institute, UMDNJ--Robert Wood Johnson Medical School and Rutgers, The State University of New Jersey, 170 Frelinghuysen Road, Piscataway, NJ 08854, U ...;Department of Pharmacology, UMDNJ--Robert Wood Johnson Medical School and Informatics Institute, University of Medicine and Dentistry of New Jersey, 675 Hoes Lane, Piscataway, NJ 08854, USA;Environmental and Occupational Health Sciences Institute, UMDNJ--Robert Wood Johnson Medical School and Rutgers, The State University of New Jersey, 170 Frelinghuysen Road, Piscataway, NJ 08854, U ...

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Results: Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets. Availability: Matlab code is available on request from the authors.