Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Machine Learning
Machine Learning
Mining phenotypes and informative genes from gene expression data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
Co-training has been proved successful in classifying many different kinds of data, such as text data and web data, which have naturally split views. Using these views as feature sets respectively, classifiers could make less generalization errors by maximizing their agreement over the unlabeled data. However, this method has limited performance in gene expression data. The first reason is that most gene expression data lacks of naturally split views. The second reason is that there are usually some noisy samples in the gene expression dataset. Furthermore, some semi-supervised algorithms prefer to add these misclassified samples to the training set, which will mislead the classification. In this paper, a Co-training based algorithm named Gene Co-Adaboost is proposed to utilize limitedly labeled gene expression samples to predict the class variables. This method splits the gene features into relatively independent views and keeps the performance stable by refusing to add unlabeled examples that may be wrongly labeled to the training set with a Cascade Judgment technique. Experiments on four public microarray datasets indicate that Gene Co-Adaboost effectively uses the unlabeled samples to improve the classification accuracy.