Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hi-index | 0.00 |
The paper presents a learning method, called iterative crosstraining (ICT) for identifying Thai Web pages. Our method combines two classifiers, i.e. a word segmentation classifier and a naive Bayes classifier, that use unlabeled examples to train each other. We compare ICT against other supervised and unsupervised learning methods: a supervised word segmentation classifier (S-Word), a supervised naive Bayes classifier (S-Bayes), an unsupervised naive Bayes classifier using the EM algorithm (U-Bayes-EM), and a co-training-style classifier (CoTraining). The experimental results show that ICT gives the best performance, followed by S-Bayes, CoTraining U-Bayes-EM and S-Word.