Text categorization based on topic model

Authors:
Shibin Zhou;Kan Li;Yushu Liu
Affiliations:
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, P.R. China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, P.R. China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, P.R. China
Venue:
RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Year:
2008

Citing 10
Cited 2

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Introduction to Variational Methods for Graphical Models

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On an equivalence between PLSI and LDA

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles

Journal of the American Society for Information Science and Technology
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Topic model methods for automatically identifying out-of-scope resources

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Dimensionality reduction with category information fusion and non-negative matrix factorization for text categorization

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regard documents of category as Language Model and use variational parameters to estimate maximum a posteriori of terms. Experiments show LDACLM model to be effective for text categorization, outperforming standard Naive Bayes and Rocchio method for text categorization.