The nature of statistical learning theory
The nature of statistical learning theory
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
The Journal of Machine Learning Research
Document clustering with prior knowledge
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Large scale semi-supervised linear SVMs
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An interactive algorithm for asking and incorporating feature feedback into support vector machines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Opinion integration through semi-supervised topic modeling
Proceedings of the 17th international conference on World Wide Web
Knowledge transformation from word space to document space
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Topic-bridged PLSA for cross-domain text classification
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Because users hardly have patience of affording enough labeled data, personalized filter is expected to converge much faster. Topic model based dimension reduction can minimize the structural risk with limited training data. In this paper, we propose a novel supervised dual-PLSA which estimate topics with many kinds of observable data, i.e. labeled and unlabeled documents, supervised information about topics. c -w PLSA model is first proposed, in which word and class are observable variables and topic is latent. Then, two generative models, c -w PLSA and typical PLSA, are combined to share observable variables in order to utilize other observed data. Furthermore, supervised information about topic is employed. This is supervised dual-PLSA. Experiments show the dual-PLSA has a very fast convergence. Within 100 gold standard feedback, dual-PLSA's cumulative error rate drops to 9%. Its total error rate is 6.94%, which is the lowest among all the filters.