Document classification using semi-supervived mixture model of von Mises-Fisher distributions on document manifold

  • Authors:
  • Nguyen Kim Anh;Ngo Van Linh;Le Hong Ky;Tam Nguyen The

  • Affiliations:
  • Hanoi University of Science and Technology;Hanoi University of Science and Technology;Hanoi University of Science and Technology;Hanoi University of Science and Technology

  • Venue:
  • Proceedings of the Fourth Symposium on Information and Communication Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document classifications is essential to information retrieval and text mining. In real life, unlabeled data is readily available whereas labeled ones are often laborious, expensive and slow to obtain. This paper proposes a novel Document Classification approach based on semi-supervised vMF mixture model on document manifold, called Laplacian regularized Semi-Supervised vMF Mixture Model(LapSSvMFs), which explicitly considers the manifold structure of document space to exploit efficiently both labeled and unlabeled data for classification. We have developed a generalized mean-field variational inference algorithm for the LapSSvMFs. Experimental results show that our approach preserves the best accuracy of purely graph-based transductive methods when the data has "manifold structure". Furthermore, high accuracy are obtained even for overlapping and fairly skewed datasets in comparison with other classification algorithms.