How to construct pseudorandom permutations from pseudorandom functions
SIAM Journal on Computing - Special issue on cryptography
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Convex Optimization
Combinational collaborative filtering for personalized community recommendation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed Algorithms for Topic Models
The Journal of Machine Learning Research
Online multiscale dynamic topic models
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Linear support vector machines via dual cached loops
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Adding distributional semantics to knowledge base entities through web-scale entity linking
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Web-scale multi-task feature selection for behavioral targeting
Proceedings of the 21st ACM international conference on Information and knowledge management
Latent factor models with additive and hierarchically-smoothed user preferences
Proceedings of the sixth ACM international conference on Web search and data mining
Semantic hashing using tags and topic modeling
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Scalable inference in max-margin topic models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical geographical modeling of user locations from social media posts
Proceedings of the 22nd international conference on World Wide Web
Distributed large-scale natural graph factorization
Proceedings of the 22nd international conference on World Wide Web
Towards a robust modeling of temporal interest change patterns for behavioral targeting
Proceedings of the 22nd international conference on World Wide Web
Solving the straggler problem with bounded staleness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Stochastic variational inference
The Journal of Machine Learning Research
Scalable dynamic nonparametric Bayesian models of content and users
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable topic-specific influence analysis on microblogs
Proceedings of the 7th ACM international conference on Web search and data mining
Proceedings of the 7th ACM international conference on Web search and data mining
Fast topic discovery from web search streams
Proceedings of the 23rd international conference on World wide web
MICA: a holistic approach to fast in-memory key-value storage
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Latent variable techniques are pivotal in tasks ranging from predicting user click patterns and targeting ads to organizing the news and managing user generated content. Latent variable techniques like topic modeling, clustering, and subspace estimation provide substantial insight into the latent structure of complex data with little or no external guidance making them ideal for reasoning about large-scale, rapidly evolving datasets. Unfortunately, due to the data dependencies and global state introduced by latent variables and the iterative nature of latent variable inference, latent-variable techniques are often prohibitively expensive to apply to large-scale, streaming datasets. In this paper we present a scalable parallel framework for efficient inference in latent variable models over streaming web-scale data. Our framework addresses three key challenges: 1) synchronizing the global state which includes global latent variables (e.g., cluster centers and dictionaries); 2) efficiently storing and retrieving the large local state which includes the data-points and their corresponding latent variables (e.g., cluster membership); and 3) sequentially incorporating streaming data (e.g., the news). We address these challenges by introducing: 1) a novel delta-based aggregation system with a bandwidth-efficient communication protocol; 2) schedule-aware out-of-core storage; and 3) approximate forward sampling to rapidly incorporate new data. We demonstrate state-of-the-art performance of our framework by easily tackling datasets two orders of magnitude larger than those addressed by the current state-of-the-art. Furthermore, we provide an optimized and easily customizable open-source implementation of the framework1.