Speeding up large-scale learning with a social prior

Authors:
Deepayan Chakrabarti;Ralf Herbrich
Affiliations:
Facebook Inc., Menlo Park, CA, USA;Amazon Inc., Berlin, UNK, Germany
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 13
Cited 0

A random graph model for massive graphs

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
A family of algorithms for approximate bayesian inference

A family of algorithms for approximate bayesian inference
Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology

Neural Computation
On social networks and collaborative recommendation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
AdHeat: an influence-based diffusion model for propagating hints to match ads

Proceedings of the 19th international conference on World wide web
Exploiting social context for review quality prediction

Proceedings of the 19th international conference on World wide web
On the quality of inferring interests from social neighbors

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Network-based sparse Bayesian classification

Pattern Recognition
Recommender systems with social regularization

Proceedings of the fourth ACM international conference on Web search and data mining
Learning relevance from heterogeneous social network and its application in online targeting
Bayesian Reasoning and Machine Learning

Bayesian Reasoning and Machine Learning
Social influence in social advertising: evidence from field experiments

Proceedings of the 13th ACM Conference on Electronic Commerce
Feature grouping and selection over an undirected graph

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Slow convergence and poor initial accuracy are two problems that plague efforts to use very large feature sets in online learning. This is especially true when only a few features are "active" in any training example, and the frequency of activations of different features is skewed. We show how these problems can be mitigated if a graph of relationships between features is known. We study this problem in a fully Bayesian setting, focusing on the problem of using Facebook user-IDs as features, with the social network giving the relationship structure. Our analysis uncovers significant problems with the obvious regularizations, and motivates a two-component mixture-model "social prior" that is provably better. Empirical results on large-scale click prediction problems show that our algorithm can learn as well as the baseline with 12M fewer training examples, and continuously outperforms it for over 60M examples. On a second problem using binned features, our model outperforms the baseline even after the latter sees 5x as much data.