Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Correlated PLSA for image clustering
MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part I
Blogger-Link-Topic model for blog mining
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Plink-LDA: using link as prior information in topic modeling
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Probabilistic Models for Social Media Mining
International Journal of Information Technology and Web Engineering
Hi-index | 0.00 |
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, in this paper we propose a citation-topic (CT) model which assumes a probabilistic generative process for corpora. In the CT model a given document is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is related to the given document. Moreover, the CT model contains a random process for selecting the related documents according to the structure of the generative model determined by links and therefore, the transitivity of the relations among documents is captured. We apply the CT model on the document clustering task and the experimental comparisons against several state-of-the-art approaches demonstrate very promising performances.