Incorporating popularity in topic models for social network analysis

Authors:
Youngchul Cha;Bin Bi;Chu-Cheng Hsieh;Junghoo Cho
Affiliations:
UCLA, Los Angeles, CA, USA;UCLA, Los Angeles, CA, USA;UCLA, Los Angeles, CA, USA;UCLA, Los Angeles, CA, USA
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 28
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Recommender systems in e-commerce

Proceedings of the 1st ACM conference on Electronic commerce
Learning to Probabilistically Identify Authoritative Documents

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On an equivalence between PLSI and LDA

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining Topic Models and Social Networks for Chat Data Mining

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Group and topic discovery from relations and text

Proceedings of the 3rd international workshop on Link discovery
Probabilistic models for discovering e-communities

Proceedings of the 15th international conference on World Wide Web
Unsupervised prediction of citation influences

Proceedings of the 24th international conference on Machine learning
Topic modeling with network regularization

Proceedings of the 17th international conference on World Wide Web
Joint latent topic models for text and citations

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mixed Membership Stochastic Blockmodels

The Journal of Machine Learning Research
Collaborative Filtering for Implicit Feedback Datasets

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
One-Class Collaborative Filtering

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Applying latent dirichlet allocation to group discovery in large graphs

Proceedings of the 2009 ACM symposium on Applied Computing
Why We Twitter: An Analysis of a Microblogging Community

Advances in Web Mining and Web Usage Analysis
HTM: a topic model for hypertexts

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Latent Dirichlet Allocation with topic-in-set knowledge

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Topic and role discovery in social networks with experiments on enron and academic email

Journal of Artificial Intelligence Research
Term weighting schemes for Latent Dirichlet Allocation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Recommending twitter users to follow using content and collaborative filtering approaches

Proceedings of the fourth ACM conference on Recommender systems
Improving social bookmark search using personalised latent variable language models

Proceedings of the fourth ACM international conference on Web search and data mining
Investigating topic models for social media user recommendation

Proceedings of the 20th international conference companion on World wide web
Scalable distributed inference of dynamic user interests for behavioral targeting

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Item popularity and recommendation accuracy

Proceedings of the fifth ACM conference on Recommender systems
Social-network analysis using topic models

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models are used to group words in a text dataset into a set of relevant topics. Unfortunately, when a few words frequently appear in a dataset, the topic groups identified by topic models become noisy because these frequent words repeatedly appear in "irrelevant" topic groups. This noise has not been a serious problem in a text dataset because the frequent words (e.g., the and is) do not have much meaning and have been simply removed before a topic model analysis. However, in a social network dataset we are interested in, they correspond to popular persons (e.g., Barack Obama and Justin Bieber) and cannot be simply removed because most people are interested in them. To solve this "popularity problem", we explicitly model the popularity of nodes (words) in topic models. For this purpose, we first introduce a notion of a "popularity component" and propose topic model extensions that effectively accommodate the popularity component. We evaluate the effectiveness of our models with a real-world Twitter dataset. Our proposed models achieve significantly lower perplexity (i.e., better prediction power) compared to the state-of-the-art baselines. In addition to the popularity problem caused by the nodes with high incoming edge degree, we also investigate the effect of the outgoing edge degree with another topic model extensions. We show that considering outgoing edge degree does not help much in achieving lower perplexity.