A random graph model for massive graphs
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
A family of algorithms for approximate bayesian inference
A family of algorithms for approximate bayesian inference
On social networks and collaborative recommendation
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
AdHeat: an influence-based diffusion model for propagating hints to match ads
Proceedings of the 19th international conference on World wide web
Exploiting social context for review quality prediction
Proceedings of the 19th international conference on World wide web
On the quality of inferring interests from social neighbors
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Network-based sparse Bayesian classification
Pattern Recognition
Recommender systems with social regularization
Proceedings of the fourth ACM international conference on Web search and data mining
Bayesian Reasoning and Machine Learning
Bayesian Reasoning and Machine Learning
Social influence in social advertising: evidence from field experiments
Proceedings of the 13th ACM Conference on Electronic Commerce
Feature grouping and selection over an undirected graph
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Slow convergence and poor initial accuracy are two problems that plague efforts to use very large feature sets in online learning. This is especially true when only a few features are "active" in any training example, and the frequency of activations of different features is skewed. We show how these problems can be mitigated if a graph of relationships between features is known. We study this problem in a fully Bayesian setting, focusing on the problem of using Facebook user-IDs as features, with the social network giving the relationship structure. Our analysis uncovers significant problems with the obvious regularizations, and motivates a two-component mixture-model "social prior" that is provably better. Empirical results on large-scale click prediction problems show that our algorithm can learn as well as the baseline with 12M fewer training examples, and continuously outperforms it for over 60M examples. On a second problem using binned features, our model outperforms the baseline even after the latter sees 5x as much data.