Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Combining content and link for classification using matrix factorization
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Joint latent topic models for text and citations
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Generalized latent factor models for social network analysis
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Scalable text and link analysis with mixed-topic link models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
This paper addresses the problem of community detection in networked data that combines link and content analysis. Most existing work combines link and content information by a generative model. There are two major shortcomings with the existing approaches. First, they assume that the probability of creating a link between two nodes is determined only by the community memberships of the nodes; however other factors(e.g. popularity) could also affect the link pattern. Second, they use generative models to model the content of individual nodes, whereas these generative models are vulnerable to the content attributes that are irrelevant to communities. We propose a Bayesian framework for combining link and content information for community detection that explicitly addresses these shortcomings. A new link model is presented that introduces a random variable to capture the node popularity when deciding the link between two nodes; a discriminative model is used to determine the community membership of a node by its content. An approximate inference algorithm is presented for efficient Bayesian inference. Our empirical study shows that the proposed framework outperforms several state-of-the-art approaches in combining link and content information for community detection.