Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Hi-index | 0.00 |
This paper introduces a method to build a classifier based on labelled and unlabelled data. We set up the Expectation-Maximization (EM) algorithm steps for the particular case of the naive Bayes approach and show empirical work for the restricted web page database. Original contributions includes the application of the EM algorithm to simulated data in order to see the behavior of the algorithm for different numbers of labelled and unlabelled data, and to study the effect of the sampling mechanism for the unlabelled data on the results.