Representing documents through their readers

Authors:
Khalid El-Arini;Min Xu;Emily B. Fox;Carlos Guestrin
Affiliations:
Facebook, Menlo Park, CA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 13
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Adaptive web search based on user profile constructed without any effort from users

Proceedings of the 13th international conference on World Wide Web
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Introduction to Information Retrieval

Introduction to Information Retrieval
A multivariate regression approach to association analysis of a quantitative trait network

Bioinformatics
Turning down the noise in the blogosphere

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A contextual-bandit approach to personalized news article recommendation

Proceedings of the 19th international conference on World wide web
The power of convex relaxation: near-optimal matrix completion

IEEE Transactions on Information Theory
From chatter to headlines: harnessing the real-time web for personalized news recommendation

Proceedings of the fifth ACM international conference on Web search and data mining
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Transparent user models for personalization

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PowerGraph: distributed graph-parallel computation on natural graphs

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers? To address this question, we model the content of news articles and blog posts by attributes of the people who are likely to share them. For example, many Twitter users describe themselves in a short profile, labeling themselves with phrases such as "vegetarian" or "liberal." By assuming that a user's labels correspond to topics in the articles he shares, we can learn a labeled dictionary from a training corpus of articles shared on Twitter. Thereafter, we can code any new document as a sparse non-negative linear combination of user labels, where we encourage correlated labels to appear together in the output via a structured sparsity penalty. Finally, we show that our approach yields a novel document representation that can be effectively used in many problem settings, from recommendation to modeling news dynamics. For example, while the top politics stories will change drastically from one month to the next, the "politics" label will still be there to describe them. We evaluate our model on millions of tweeted news articles and blog posts collected between September 2010 and September 2012, demonstrating that our approach is effective.