Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
The Journal of Machine Learning Research
A high-performance semi-supervised learning method for text chunking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Applying alternating structure optimization to word sense disambiguation
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Automatic compilation of travel information from automatically identified travel blogs
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Topic classification of blog posts using distant supervision
Proceedings of the Workshop on Semantic Analysis in Social Media
Hi-index | 0.00 |
Blog classification (e.g., identifying bloggers' gender or age) is one of the most interesting current problems in blog analysis. Although this problem is usually solved by applying supervised learning techniques, the large labeled dataset required for training is not always available. In contrast, unlabeled blogs can easily be collected from the web. Therefore, a semi-supervised learning method for blog classification, effectively using unlabeled data, is proposed. In this method, entries from the same blog are assumed to have the same characteristics. With this assumption, the proposed method captures the characteristics of each blog, such as writing style and topic, and uses these characteristics to improve the classification accuracy.