Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
The Journal of Machine Learning Research
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Fast nonparametric matrix factorization for large-scale collaborative filtering
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Distributed Algorithms for Topic Models
The Journal of Machine Learning Research
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce
Proceedings of the 19th international conference on World wide web
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
Large-scale matrix factorization with distributed stochastic gradient descent
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable inference in latent variable models
Proceedings of the fifth ACM international conference on Web search and data mining
Overlapping clusters for distributed computation
Proceedings of the fifth ACM international conference on Web search and data mining
Foundations and Trends® in Machine Learning
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Hierarchical geographical modeling of user locations from social media posts
Proceedings of the 22nd international conference on World Wide Web
CoBaFi: collaborative bayesian filtering
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Natural graphs, such as social networks, email graphs, or instant messaging patterns, have become pervasive through the internet. These graphs are massive, often containing hundreds of millions of nodes and billions of edges. While some theoretical models have been proposed to study such graphs, their analysis is still difficult due to the scale and nature of the data. We propose a framework for large-scale graph decomposition and inference. To resolve the scale, our framework is distributed so that the data are partitioned over a shared-nothing set of machines. We propose a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions. Our decomposition is based on a streaming algorithm. It is network-aware as it adapts to the network topology of the underlying computational hardware. We use local copies of the variables and an efficient asynchronous communication protocol to synchronize the replicated values in order to perform most of the computation without having to incur the cost of network communication. On a graph of 200 million vertices and 10 billion edges, derived from an email communication network, our algorithm retains convergence properties while allowing for almost linear scalability in the number of computers.