Streaming graph partitioning for large distributed graphs

Authors:
Isabelle Stanton;Gabriel Kliot
Affiliations:
University of California Berkeley, Berkeley, CA, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 19
Cited 11

PMRSB: parallel multilevel recursive spectral bisection

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Fast Approximate Graph Partitioning Algorithms

SIAM Journal on Computing
Parallel Multilevel Graph Partitioning

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Local Graph Partitioning using PageRank Vectors

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Balanced Graph Partitioning

Theory of Computing Systems
Measurement and analysis of online social networks

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Expander flows, geometric embeddings and graph partitioning

Journal of the ACM (JACM)
Finding sparse cuts locally using evolving sets

Proceedings of the forty-first annual ACM symposium on Theory of computing
Graph Sparsification in the Semi-streaming Model

ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Pregel: a system for large-scale graph processing - "ABSTRACT"

Proceedings of the 28th ACM symposium on Principles of distributed computing
Kronecker Graphs: An Approach to Modeling Networks

The Journal of Machine Learning Research
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Intractability of min- and max-cut in streaming graphs

Information Processing Letters
Fast incremental and personalized PageRank

Proceedings of the VLDB Endowment
Estimating PageRank on graph streams

Journal of the ACM (JACM)
Orleans: cloud computing for everyone

Proceedings of the 2nd ACM Symposium on Cloud Computing
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation

Facilitating real-time graph mining

Proceedings of the fourth international workshop on Cloud data management
Balanced label propagation for partitioning massive graphs

Proceedings of the sixth ACM international conference on Web search and data mining
GraphBuilder: scalable graph ETL framework

First International Workshop on Graph Data Management Experiences and Systems
GPS: a graph processing system

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Restreaming graph partitioning: simple versatile algorithms for advanced balancing

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
PAGE: a partition aware graph computation engine

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient processing of streaming graphs for evolution-aware clustering

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Adaptive partitioning for large-scale dynamic graphs

Proceedings of the 4th annual Symposium on Cloud Computing
Giraphx: parallel yet serializable large-scale graph processing

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
FENNEL: streaming graph partitioning for massive scale graphs

Proceedings of the 7th ACM international conference on Web search and data mining
Fast iterative graph computation with block updates

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without partitioning the graph, communication quickly becomes a limiting factor in scaling the system up. Existing graph partitioning heuristics incur high computation and communication cost on large graphs, sometimes as high as the future computation itself. Observing that the graph has to be loaded into the cluster, we ask if the partitioning can be done at the same time with a lightweight streaming algorithm. We propose natural, simple heuristics and compare their performance to hashing and METIS, a fast, offline heuristic. We show on a large collection of graph datasets that our heuristics are a significant improvement, with the best obtaining an average gain of 76%. The heuristics are scalable in the size of the graphs and the number of partitions. Using our streaming partitioning methods, we are able to speed up PageRank computations on Spark, a distributed computation system, by 18% to 39% for large social networks.