When does Co-training Work in Real Data?

Authors:
Charles X. Ling;Jun Du;Zhi-Hua Zhou
Affiliations:
Department of Computer Science, The University of Western Ontario, London, Canada N6A 5B7;Department of Computer Science, The University of Western Ontario, London, Canada N6A 5B7;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 210093
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 9
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
A perspective view and survey of meta-learning

Artificial Intelligence Review
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Instance selection in semi-supervised learning

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
A case study of stacked multi-view learning in dementia research

AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine
Classification of emotional states in a woz scenario exploiting labeled and unlabeled bio-physiological data

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Co-training, a paradigm of semi-supervised learning, may alleviate effectively the data scarcity problem (i.e., the lack of labeled examples) in supervised learning. The standard two-view co-training requires the dataset be described by two views of attributes, and previous theoretical studies proved that if the two views satisfy the sufficiency and independence assumptions, co-training is guaranteed to work well. However, little work has been done on how these assumptions can be empirically verified given datasets. In this paper, we first propose novel approaches to verify empirically the two assumptions of co-training based on datasets. We then propose simple heuristic to split a single view of attributes into two views, and discover regularity on the sufficiency and independence thresholds for the standard two-view co-training to work well. Our empirical results not only coincide well with the previous theoretical findings, but also provide a practical guideline to decide when co-training should work well based on datasets.