The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Recommender systems in e-commerce
Proceedings of the 1st ACM conference on Electronic commerce
Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On an equivalence between PLSI and LDA
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining Topic Models and Social Networks for Chat Data Mining
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Group and topic discovery from relations and text
Proceedings of the 3rd international workshop on Link discovery
Probabilistic models for discovering e-communities
Proceedings of the 15th international conference on World Wide Web
Unsupervised prediction of citation influences
Proceedings of the 24th international conference on Machine learning
Topic modeling with network regularization
Proceedings of the 17th international conference on World Wide Web
Joint latent topic models for text and citations
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mixed Membership Stochastic Blockmodels
The Journal of Machine Learning Research
Collaborative Filtering for Implicit Feedback Datasets
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
One-Class Collaborative Filtering
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Applying latent dirichlet allocation to group discovery in large graphs
Proceedings of the 2009 ACM symposium on Applied Computing
Why We Twitter: An Analysis of a Microblogging Community
Advances in Web Mining and Web Usage Analysis
HTM: a topic model for hypertexts
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Latent Dirichlet Allocation with topic-in-set knowledge
SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Topic and role discovery in social networks with experiments on enron and academic email
Journal of Artificial Intelligence Research
Term weighting schemes for Latent Dirichlet Allocation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Recommending twitter users to follow using content and collaborative filtering approaches
Proceedings of the fourth ACM conference on Recommender systems
Improving social bookmark search using personalised latent variable language models
Proceedings of the fourth ACM international conference on Web search and data mining
Investigating topic models for social media user recommendation
Proceedings of the 20th international conference companion on World wide web
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Item popularity and recommendation accuracy
Proceedings of the fifth ACM conference on Recommender systems
Social-network analysis using topic models
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Topic models are used to group words in a text dataset into a set of relevant topics. Unfortunately, when a few words frequently appear in a dataset, the topic groups identified by topic models become noisy because these frequent words repeatedly appear in "irrelevant" topic groups. This noise has not been a serious problem in a text dataset because the frequent words (e.g., the and is) do not have much meaning and have been simply removed before a topic model analysis. However, in a social network dataset we are interested in, they correspond to popular persons (e.g., Barack Obama and Justin Bieber) and cannot be simply removed because most people are interested in them. To solve this "popularity problem", we explicitly model the popularity of nodes (words) in topic models. For this purpose, we first introduce a notion of a "popularity component" and propose topic model extensions that effectively accommodate the popularity component. We evaluate the effectiveness of our models with a real-world Twitter dataset. Our proposed models achieve significantly lower perplexity (i.e., better prediction power) compared to the state-of-the-art baselines. In addition to the popularity problem caused by the nodes with high incoming edge degree, we also investigate the effect of the outgoing edge degree with another topic model extensions. We show that considering outgoing edge degree does not help much in achieving lower perplexity.