Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
The Journal of Machine Learning Research
ICML '05 Proceedings of the 22nd international conference on Machine learning
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Modeling hidden topics on document manifold
Proceedings of the 17th ACM conference on Information and knowledge management
Probabilistic dyadic data analysis with local and global consistency
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Hi-index | 0.00 |
Document classifications is essential to information retrieval and text mining. In real life, unlabeled data is readily available whereas labeled ones are often laborious, expensive and slow to obtain. This paper proposes a novel Document Classification approach based on semi-supervised vMF mixture model on document manifold, called Laplacian regularized Semi-Supervised vMF Mixture Model(LapSSvMFs), which explicitly considers the manifold structure of document space to exploit efficiently both labeled and unlabeled data for classification. We have developed a generalized mean-field variational inference algorithm for the LapSSvMFs. Experimental results show that our approach preserves the best accuracy of purely graph-based transductive methods when the data has "manifold structure". Furthermore, high accuracy are obtained even for overlapping and fairly skewed datasets in comparison with other classification algorithms.