An Introduction to Variational Methods for Graphical Models
Machine Learning
Modern Information Retrieval
Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Journal of Machine Learning Research
ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised prediction of citation influences
Proceedings of the 24th international conference on Machine learning
Topic modeling with network regularization
Proceedings of the 17th international conference on World Wide Web
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Large scale microblog mining using distributed MB-LDA
Proceedings of the 21st international conference companion on World Wide Web
Mining contentions from discussions and debates
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating popularity in topic models for social network analysis
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Scalable text and link analysis with mixed-topic link models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recently, topic models for processing hypertexts such as web pages were also proposed. The proposed hypertext models are generative models giving rise to both words and hyperlinks. This paper points out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only. The paper then proposes a new topic model for hypertext processing, referred to as Hypertext Topic Model (HTM). HTM defines the distribution of words in a document (i.e., the content of the document) as a mixture over latent topics in the document itself and latent topics in the documents which the document cites. The topics are further characterized as distributions of words, as in the conventional topic models. This paper further proposes a method for learning the HTM model. Experimental results show that HTM outperforms the baselines on topic discovery and document classification in three datasets.