Parallel multilevel k-way partitioning scheme for irregular graphs
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
New spectral bounds on k-partitioning of graphs
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Statistical properties of community structure in large social and information networks
Proceedings of the 17th international conference on World Wide Web
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Kronecker Graphs: An Approach to Modeling Networks
The Journal of Machine Learning Research
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Multilevel algorithms for partitioning power-law graphs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
HAMA: An Efficient Matrix Computation with the MapReduce Framework
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Kineograph: taking the pulse of a fast-changing and connected world
Proceedings of the 7th ACM european conference on Computer Systems
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Streaming graph partitioning for large distributed graphs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Massive streaming data analytics: a graph-based approach
XRDS: Crossroads, The ACM Magazine for Students - Scientific Computing
Hi-index | 0.00 |
Graph abstraction is essential for many applications from finding a shortest path to executing complex machine learning (ML) algorithms like collaborative filtering. Graph construction from raw data for various applications is becoming challenging, due to exponential growth in data, as well as the need for large scale graph processing. Since graph construction is a data-parallel problem, MapReduce is well-suited for this task. We developed GraphBuilder, a scalable framework for graph Extract-Transform-Load (ETL), to offload many of the complexities of graph construction, including graph formation, tabulation, transformation, partitioning, output formatting, and serialization. GraphBuilder is written in Java, for ease of programming, and it scales using the MapReduce model. In this paper, we describe the motivation for GraphBuilder, its architecture, MapReduce algorithms, and performance evaluation of the framework. Since large graphs should be partitioned over a cluster for storing and processing and partitioning methods have significant performance impacts, we develop several graph partitioning methods and evaluate their performance. We also open source the framework at https://01.org/graphbuilder/.