Consensus analysis of multiple classifiers using non-repetitive variables: Diagnostic application to microarray gene expression data

  • Authors:
  • Zhenqiang Su;Huixiao Hong;Roger Perkins;Xueguang Shao;Wensheng Cai;Weida Tong

  • Affiliations:
  • Department of Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China and Center for Toxicoinformatics, National Center for Toxicological Research (NCTR), US Food and ...;Division of Bioinformatics, Z-Tech at FDA's National Center for Toxicological Research, Jefferson, AR 72079, USA;Division of Bioinformatics, Z-Tech at FDA's National Center for Toxicological Research, Jefferson, AR 72079, USA;Department of Chemistry, Nankai University, Tianjin 300071, China;Department of Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China and Department of Chemistry, Nankai University, Tianjin 300071, China;Center for Toxicoinformatics, National Center for Toxicological Research (NCTR), US Food and Drug Administration (FDA), 3900 NCTR Road, HFT 020, Jefferson, AR 72079, USA

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Class prediction based on DNA microarray data has been emerged as one of the most important application of bioinformatics for diagnostics/prognostics. Robust classifiers are needed that use most biologically relevant genes embedded in the data. A consensus approach that combines multiple classifiers has attributes that mitigate this difficulty compared to a single classifier. A new classification method named as consensus analysis of multiple classifiers using non-repetitive variables (CAMCUN) was proposed for the analysis of hyper-dimensional gene expression data. The CAMCUN method combined multiple classifiers, each of which was built from distinct, non-repeated genes that were selected for effectiveness in class differentiation. Thus, the CAMCUN utilized most biologically relevant genes in the final classifier. The CAMCUN algorithm was demonstrated to give consistently more accurate predictions for two well-known datasets for prostate cancer and leukemia. Importantly, the CAMCUN algorithm employed an integrated 10-fold cross-validation and randomization test to assess the degree of confidence of the predictions for unknown samples.