GraphBuilder: scalable graph ETL framework

  • Authors:
  • Nilesh Jain;Guangdeng Liao;Theodore L. Willke

  • Affiliations:
  • Systems Architecture Lab, Intel Corporation, Hillsboro, OR;Systems Architecture Lab, Intel Corporation, Hillsboro, OR;Systems Architecture Lab, Intel Corporation, Hillsboro, OR

  • Venue:
  • First International Workshop on Graph Data Management Experiences and Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graph abstraction is essential for many applications from finding a shortest path to executing complex machine learning (ML) algorithms like collaborative filtering. Graph construction from raw data for various applications is becoming challenging, due to exponential growth in data, as well as the need for large scale graph processing. Since graph construction is a data-parallel problem, MapReduce is well-suited for this task. We developed GraphBuilder, a scalable framework for graph Extract-Transform-Load (ETL), to offload many of the complexities of graph construction, including graph formation, tabulation, transformation, partitioning, output formatting, and serialization. GraphBuilder is written in Java, for ease of programming, and it scales using the MapReduce model. In this paper, we describe the motivation for GraphBuilder, its architecture, MapReduce algorithms, and performance evaluation of the framework. Since large graphs should be partitioned over a cluster for storing and processing and partitioning methods have significant performance impacts, we develop several graph partitioning methods and evaluate their performance. We also open source the framework at https://01.org/graphbuilder/.