Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Selective Sampling with Redundant Views
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Word sense disambiguation using Conceptual Density
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Text Classification by Boosting Weak Learners based on Terms and Concepts
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Granulation-based symbolic representation of time series and semi-supervised classification
Computers & Mathematics with Applications
Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms
Intelligent Data Analysis
Hi-index | 0.09 |
Many semi-supervised learning algorithms only consider the distribution of word frequency, ignoring the semantic and syntactic information underlying the documents. In this paper, we present a new multi-view approach for semi-supervised document classification by incorporating both semantic and syntactic information. For this purpose, a co-training style algorithm, Co-features, is proposed. In the phase of active querying, we assign a weight to each sample document according to its uncertainty factor. Then the most informative samples are selected and labeled by other ''teachers''. In contrast to batch training mode, we developed an incremental Naive Bayes update method, which allows for more efficient training even with a large pool of unlabeled data. Experimental results show that our algorithm works successfully on the datasets Reuters-21578 and WebKB, and is superior to Co-testing in the learning efficiency.