Distributed large-scale natural graph factorization

Authors:
Amr Ahmed;Nino Shervashidze;Shravan Narayanamurthy;Vanja Josifovski;Alexander J. Smola
Affiliations:
Google Inc., Mountain View, CA, USA;INRIA, ENS, Paris, France;Microsoft, Banglore, India;Google Inc., Mountain View, CA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 22nd international conference on World Wide Web
Year:
2013

Citing 17
Cited 2

Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Latent dirichlet allocation

The Journal of Machine Learning Research
Balanced graph partitioning

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Large-Scale Parallel Collaborative Filtering for the Netflix Prize

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Fast nonparametric matrix factorization for large-scale collaborative filtering

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Matrix Factorization Techniques for Recommender Systems

Computer
Distributed Algorithms for Topic Models

The Journal of Machine Learning Research
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce

Proceedings of the 19th international conference on World wide web
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An architecture for parallel topic models

Proceedings of the VLDB Endowment
Counting triangles and the curse of the last reducer

Proceedings of the 20th international conference on World wide web
Large-scale matrix factorization with distributed stochastic gradient descent

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable distributed inference of dynamic user interests for behavioral targeting

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable inference in latent variable models

Proceedings of the fifth ACM international conference on Web search and data mining
Overlapping clusters for distributed computation

Proceedings of the fifth ACM international conference on Web search and data mining
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Foundations and Trends® in Machine Learning
PowerGraph: distributed graph-parallel computation on natural graphs

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

Hierarchical geographical modeling of user locations from social media posts

Proceedings of the 22nd international conference on World Wide Web
CoBaFi: collaborative bayesian filtering

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Natural graphs, such as social networks, email graphs, or instant messaging patterns, have become pervasive through the internet. These graphs are massive, often containing hundreds of millions of nodes and billions of edges. While some theoretical models have been proposed to study such graphs, their analysis is still difficult due to the scale and nature of the data. We propose a framework for large-scale graph decomposition and inference. To resolve the scale, our framework is distributed so that the data are partitioned over a shared-nothing set of machines. We propose a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions. Our decomposition is based on a streaming algorithm. It is network-aware as it adapts to the network topology of the underlying computational hardware. We use local copies of the variables and an efficient asynchronous communication protocol to synchronize the replicated values in order to perform most of the computation without having to incur the cost of network communication. On a graph of 200 million vertices and 10 billion edges, derived from an email communication network, our algorithm retains convergence properties while allowing for almost linear scalability in the number of computers.