PMRSB: parallel multilevel recursive spectral bisection
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A multilevel algorithm for partitioning graphs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Fast Approximate Graph Partitioning Algorithms
SIAM Journal on Computing
Parallel Multilevel Graph Partitioning
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Local Graph Partitioning using PageRank Vectors
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Theory of Computing Systems
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Expander flows, geometric embeddings and graph partitioning
Journal of the ACM (JACM)
Finding sparse cuts locally using evolving sets
Proceedings of the forty-first annual ACM symposium on Theory of computing
Graph Sparsification in the Semi-streaming Model
ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Pregel: a system for large-scale graph processing - "ABSTRACT"
Proceedings of the 28th ACM symposium on Principles of distributed computing
Kronecker Graphs: An Approach to Modeling Networks
The Journal of Machine Learning Research
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Intractability of min- and max-cut in streaming graphs
Information Processing Letters
Fast incremental and personalized PageRank
Proceedings of the VLDB Endowment
Estimating PageRank on graph streams
Journal of the ACM (JACM)
Orleans: cloud computing for everyone
Proceedings of the 2nd ACM Symposium on Cloud Computing
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Facilitating real-time graph mining
Proceedings of the fourth international workshop on Cloud data management
Balanced label propagation for partitioning massive graphs
Proceedings of the sixth ACM international conference on Web search and data mining
GraphBuilder: scalable graph ETL framework
First International Workshop on Graph Data Management Experiences and Systems
GPS: a graph processing system
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Restreaming graph partitioning: simple versatile algorithms for advanced balancing
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
PAGE: a partition aware graph computation engine
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient processing of streaming graphs for evolution-aware clustering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Adaptive partitioning for large-scale dynamic graphs
Proceedings of the 4th annual Symposium on Cloud Computing
Giraphx: parallel yet serializable large-scale graph processing
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
FENNEL: streaming graph partitioning for massive scale graphs
Proceedings of the 7th ACM international conference on Web search and data mining
Fast iterative graph computation with block updates
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without partitioning the graph, communication quickly becomes a limiting factor in scaling the system up. Existing graph partitioning heuristics incur high computation and communication cost on large graphs, sometimes as high as the future computation itself. Observing that the graph has to be loaded into the cluster, we ask if the partitioning can be done at the same time with a lightweight streaming algorithm. We propose natural, simple heuristics and compare their performance to hashing and METIS, a fast, offline heuristic. We show on a large collection of graph datasets that our heuristics are a significant improvement, with the best obtaining an average gain of 76%. The heuristics are scalable in the size of the graphs and the number of partitions. Using our streaming partitioning methods, we are able to speed up PageRank computations on Spark, a distributed computation system, by 18% to 39% for large social networks.