A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
A Polylogarithmic Approximation of the Minimum Bisection
SIAM Journal on Computing
Parallel Multilevel Graph Partitioning
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Algorithms for Graph Partitioning on the Planted Partition Model
RANDOM-APPROX '99 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization Problems: Randomization, Approximation, and Combinatorial Algorithms and Techniques
Some simplified NP-complete problems
STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
Expander flows, geometric embeddings and graph partitioning
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Partitioning graphs into balanced components
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Hermes: clustering users in large-scale e-mail services
Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The little engine(s) that could: scaling online social networks
Proceedings of the ACM SIGCOMM 2010 conference
Optimizing data partitioning for data-parallel computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Dense subgraph maintenance under streaming edge weight updates for real-time story identification
Proceedings of the VLDB Endowment
Streaming graph partitioning for large distributed graphs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Managing large graphs on multi-cores with graph awareness
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Balanced label propagation for partitioning massive graphs
Proceedings of the sixth ACM international conference on Web search and data mining
Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Restreaming graph partitioning: simple versatile algorithms for advanced balancing
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of non-neighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs. Surprisingly, despite the fact that our algorithm is a one-pass streaming algorithm, we found its performance to be in many cases comparable to the de-facto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 81/2 hours by METIS to produce a balanced partition that cuts 11.98% of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.