Systematic benchmarking of microarray data feature extraction and classification

  • Authors:
  • Jing Zhang;Tianzi Jiang;Bing Liu;Xingpeng Jiang;Huizhi Zhao

  • Affiliations:
  • LIAMA, Sino French Laboratory in Computer Science, Automation and Applied Mathematics, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R.China;LIAMA, Sino French Laboratory in Computer Science, Automation and Applied Mathematics, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R.China;LIAMA, Sino French Laboratory in Computer Science, Automation and Applied Mathematics, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R.China;LIAMA, Sino French Laboratory in Computer Science, Automation and Applied Mathematics, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R.China;LIAMA, Sino French Laboratory in Computer Science, Automation and Applied Mathematics, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R.China

  • Venue:
  • International Journal of Computer Mathematics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A combination of microarrays with classification methods is a promising approach to supporting clinical management decisions in oncology. The aim of this paper is to systematically benchmark the role of classification models. Each classification model is a combination of one feature extraction method and one classification method. We consider four feature extraction methods and five classification methods, from which 20 classification models can be derived. The feature extraction methods are t-statistics, non-parametric Wilcoxon statistics, ad hoc signal-to-noise statistics, and principal component analysis (PCA), and the classification methods are Fisher linear discriminant analysis (FLDA), the support vector machine (SVM), the k nearest-neighbour classifier (kNN), diagonal linear discriminant analysis (DLDA), and diagonal quadratic discriminant analysis (DQDA). Twenty randomizations of each of three binary cancer classification problems derived from publicly available datasets are examined. PCA plus FLDA is found to be the optimal classification model.