Scalable inference in latent variable models

Authors:
Amr Ahmed;Moahmed Aly;Joseph Gonzalez;Shravan Narayanamurthy;Alexander J. Smola
Affiliations:
Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Carnegie Mellon University, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, USA;Yahoo! Research, Santa Clara, USA
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 9
Cited 18

How to construct pseudorandom permutations from pseudorandom functions

SIAM Journal on Computing - Special issue on cryptography
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Convex Optimization

Convex Optimization
Combinational collaborative filtering for personalized community recommendation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed Algorithms for Topic Models

The Journal of Machine Learning Research
Online multiscale dynamic topic models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
An architecture for parallel topic models

Proceedings of the VLDB Endowment
Scalable distributed inference of dynamic user interests for behavioral targeting

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce

Proceedings of the 21st international conference on World Wide Web
Linear support vector machines via dual cached loops

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PowerGraph: distributed graph-parallel computation on natural graphs

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Adding distributional semantics to knowledge base entities through web-scale entity linking

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Web-scale multi-task feature selection for behavioral targeting

Proceedings of the 21st ACM international conference on Information and knowledge management
Latent factor models with additive and hierarchically-smoothed user preferences

Proceedings of the sixth ACM international conference on Web search and data mining
Semantic hashing using tags and topic modeling

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Scalable inference in max-margin topic models

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical geographical modeling of user locations from social media posts

Proceedings of the 22nd international conference on World Wide Web
Distributed large-scale natural graph factorization

Proceedings of the 22nd international conference on World Wide Web
Towards a robust modeling of temporal interest change patterns for behavioral targeting

Proceedings of the 22nd international conference on World Wide Web
Solving the straggler problem with bounded staleness

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Stochastic variational inference

The Journal of Machine Learning Research
Scalable dynamic nonparametric Bayesian models of content and users

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable topic-specific influence analysis on microblogs

Proceedings of the 7th ACM international conference on Web search and data mining
Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising

Proceedings of the 7th ACM international conference on Web search and data mining
Fast topic discovery from web search streams

Proceedings of the 23rd international conference on World wide web
MICA: a holistic approach to fast in-memory key-value storage

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent variable techniques are pivotal in tasks ranging from predicting user click patterns and targeting ads to organizing the news and managing user generated content. Latent variable techniques like topic modeling, clustering, and subspace estimation provide substantial insight into the latent structure of complex data with little or no external guidance making them ideal for reasoning about large-scale, rapidly evolving datasets. Unfortunately, due to the data dependencies and global state introduced by latent variables and the iterative nature of latent variable inference, latent-variable techniques are often prohibitively expensive to apply to large-scale, streaming datasets. In this paper we present a scalable parallel framework for efficient inference in latent variable models over streaming web-scale data. Our framework addresses three key challenges: 1) synchronizing the global state which includes global latent variables (e.g., cluster centers and dictionaries); 2) efficiently storing and retrieving the large local state which includes the data-points and their corresponding latent variables (e.g., cluster membership); and 3) sequentially incorporating streaming data (e.g., the news). We address these challenges by introducing: 1) a novel delta-based aggregation system with a bandwidth-efficient communication protocol; 2) schedule-aware out-of-core storage; and 3) approximate forward sampling to rapidly incorporate new data. We demonstrate state-of-the-art performance of our framework by easily tackling datasets two orders of magnitude larger than those addressed by the current state-of-the-art. Furthermore, we provide an optimized and easily customizable open-source implementation of the framework1.