Gene Co-Adaboost: a semi-supervised approach for classifying gene expression data

  • Authors:
  • Nan Du;Kang Li;Supriya D. Mahajan;Stanley A. Schwartz;Bindukumar B. Nair;Chiu Bin Hsiao;Aidong Zhang

  • Affiliations:
  • State University of New York at Buffalo;State University of New York at Buffalo;State University of New York at Buffalo;State University of New York at Buffalo;State University of New York at Buffalo;State University of New York at Buffalo;State University of New York at Buffalo

  • Venue:
  • Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Co-training has been proved successful in classifying many different kinds of data, such as text data and web data, which have naturally split views. Using these views as feature sets respectively, classifiers could make less generalization errors by maximizing their agreement over the unlabeled data. However, this method has limited performance in gene expression data. The first reason is that most gene expression data lacks of naturally split views. The second reason is that there are usually some noisy samples in the gene expression dataset. Furthermore, some semi-supervised algorithms prefer to add these misclassified samples to the training set, which will mislead the classification. In this paper, a Co-training based algorithm named Gene Co-Adaboost is proposed to utilize limitedly labeled gene expression samples to predict the class variables. This method splits the gene features into relatively independent views and keeps the performance stable by refusing to add unlabeled examples that may be wrongly labeled to the training set with a Cascade Judgment technique. Experiments on four public microarray datasets indicate that Gene Co-Adaboost effectively uses the unlabeled samples to improve the classification accuracy.