Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior

Authors:
Dongling Chen;Daling Wang;Ge Yu
Affiliations:
Northeastern University, Shenyang, P.R. China 110004 and School of Information, Shenyang University, Shenyang, P.R. China 110044;Northeastern University, Shenyang, P.R. China 110004;Northeastern University, Shenyang, P.R. China 110004
Venue:
Advanced Web and NetworkTechnologies, and Applications
Year:
2008

Citing 17
Cited 0

Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering with cluster refinement and model selection capabilities

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Topic analysis using a finite mixture model

Information Processing and Management: an International Journal
Latent dirichlet allocation

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On the use of linear programming for unsupervised text classification

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Bayesian hierarchical clustering

ICML '05 Proceedings of the 22nd international conference on Machine learning
Text clustering with extended user feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Topic sentiment mixture: modeling facets and opinions in weblogs

Proceedings of the 16th international conference on World Wide Web
Hierarchical mixture models: a probabilistic analysis

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Bayesian classifiers from positive and unlabeled examples

Pattern Recognition Letters
Topic and role discovery in social networks

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we proposed a bayesian mixture model, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering. It is a novel unsupervised text learning algorithm to cluster large-scale web data. In addition, parameters estimation we adopt Maximum Likelihood (ML) and EM algorithm to estimate the model parameters, and employed BIC principle to determine the number of clusters. Experimental results show that method we proposed distinctly outperformed baseline algorithms.