C4.5: programs for machine learning
C4.5: programs for machine learning
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
A perspective view and survey of meta-learning
Artificial Intelligence Review
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Instance selection in semi-supervised learning
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
A case study of stacked multi-view learning in dementia research
AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine
PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Hi-index | 0.00 |
Co-training, a paradigm of semi-supervised learning, may alleviate effectively the data scarcity problem (i.e., the lack of labeled examples) in supervised learning. The standard two-view co-training requires the dataset be described by two views of attributes, and previous theoretical studies proved that if the two views satisfy the sufficiency and independence assumptions, co-training is guaranteed to work well. However, little work has been done on how these assumptions can be empirically verified given datasets. In this paper, we first propose novel approaches to verify empirically the two assumptions of co-training based on datasets. We then propose simple heuristic to split a single view of attributes into two views, and discover regularity on the sufficiency and independence thresholds for the standard two-view co-training to work well. Our empirical results not only coincide well with the previous theoretical findings, but also provide a practical guideline to decide when co-training should work well based on datasets.