Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Transferring naive bayes classifiers for text classification
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Hi-index | 0.00 |
Text classification is a well-known problem for various applications. For last decades, it is beleived that a large corpus is one of the most important aspects for better classification. However, even though a great number of documents is available for training a classifier, it is practically impossible to achieve an ideal performance, since the distributions of labeled and unlabeled documents are often different. To overcome this problem, this paper describes a novel Naïve Bayes classifier for text classification under distribution difference between training and test data. The proposed method approximates test distribution by weighting labeled documents to cope with the distribution difference. Unlike other transfer learning which estimates the weights of labeled documents, the proposed method considerd both the documents and their estimated class labels. Therefore, the proposed method naturally combines the advantage of semi-supervised learning with those of transfer learning.