The unsymmetrical-style co-training

Authors:
Bin Wang;Harry Zhang;Bruce Spencer;Yuanyuan Guo
Affiliations:
Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada;National Research Council of Canada, Fredericton, NB, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 13
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Bootstrapping coreference classifiers with multiple machine learning algorithms

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Analyzing Co-training Style Algorithms

ECML '07 Proceedings of the 18th European conference on Machine Learning
Semi-supervised regression with co-training

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Semi-supervised self-training for sentence subjectivity classification

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised learning has attracted much attention over the past decade because it provides the advantage of combining unlabeled data with labeled data to improve the learning capability of models. Cotraining is a representative paradigm of semi-supervised learning methods. Typically, some co-training style algorithms, such as co-training and co-EM, learn two classifiers based on two views of the instance space. But they have to satisfy the assumptions that these two views are sufficient and conditionally independent given the class labels. Other co-training style algorithms, such as multiple-learner, use two different underlying classifiers based on only a single view of the instance space. However, they could not utilize the labeled data effectively, and suffer from the early convergence. After analyzing various co-training style algorithms, we have found that all of these algorithms have symmetrical framework structures that are related to their constraints. In this paper, we propose a novel unsymmetrical-style method, which we call the unsymmetrical cotraining algorithm. The unsymmetrical co-training algorithm combines the advantages of other co-training style algorithms and overcomes their disadvantages. Within our unsymmetrical structure, we apply two unsymmetrical classifiers, namely, the self-training classifier and the EM classifier, and then train these two classifiers in an unsymmetrical way. The unsymmetrical co-training algorithm not only avoids the constraint of the conditional independence assumption, but also overcomes the flaws of the early convergence and the ineffective utilization of labeled data. We conduct experiments to compare the performances of these cotraining style algorithms. From the experimental results, we can see that the unsymmetrical co-training algorithm outperforms other co-training algorithms.