Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior

  • Authors:
  • Dongling Chen;Daling Wang;Ge Yu

  • Affiliations:
  • Northeastern University, Shenyang, P.R. China 110004 and School of Information, Shenyang University, Shenyang, P.R. China 110044;Northeastern University, Shenyang, P.R. China 110004;Northeastern University, Shenyang, P.R. China 110004

  • Venue:
  • Advanced Web and NetworkTechnologies, and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we proposed a bayesian mixture model, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering. It is a novel unsupervised text learning algorithm to cluster large-scale web data. In addition, parameters estimation we adopt Maximum Likelihood (ML) and EM algorithm to estimate the model parameters, and employed BIC principle to determine the number of clusters. Experimental results show that method we proposed distinctly outperformed baseline algorithms.