Document classification using semi-supervived mixture model of von Mises-Fisher distributions on document manifold

Authors:
Nguyen Kim Anh;Ngo Van Linh;Le Hong Ky;Tam Nguyen The
Affiliations:
Hanoi University of Science and Technology;Hanoi University of Science and Technology;Hanoi University of Science and Technology;Hanoi University of Science and Technology
Venue:
Proceedings of the Fourth Symposium on Information and Communication Technology
Year:
2013

Citing 10
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Generative model-based document clustering: a comparative study

Knowledge and Information Systems
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

The Journal of Machine Learning Research
Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Modeling hidden topics on document manifold

Proceedings of the 17th ACM conference on Information and knowledge management
Probabilistic dyadic data analysis with local and global consistency

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document classifications is essential to information retrieval and text mining. In real life, unlabeled data is readily available whereas labeled ones are often laborious, expensive and slow to obtain. This paper proposes a novel Document Classification approach based on semi-supervised vMF mixture model on document manifold, called Laplacian regularized Semi-Supervised vMF Mixture Model(LapSSvMFs), which explicitly considers the manifold structure of document space to exploit efficiently both labeled and unlabeled data for classification. We have developed a generalized mean-field variational inference algorithm for the LapSSvMFs. Experimental results show that our approach preserves the best accuracy of purely graph-based transductive methods when the data has "manifold structure". Furthermore, high accuracy are obtained even for overlapping and fairly skewed datasets in comparison with other classification algorithms.