GraphX: a resilient distributed graph system on Spark

Authors:
Reynold S. Xin;Joseph E. Gonzalez;Michael J. Franklin;Ion Stoica
Affiliations:
AMPLab, EECS, UC Berkeley;AMPLab, EECS, UC Berkeley;AMPLab, EECS, UC Berkeley;AMPLab, EECS, UC Berkeley
Venue:
First International Workshop on Graph Data Management Experiences and Systems
Year:
2013

Citing 8
Cited 2

Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe

SIAM Journal on Scientific Computing
Multilevel algorithms for partitioning power-law graphs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Signal/collect: graph algorithms for the (semantic) web

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
The Combinatorial BLAS: design, implementation, and applications

International Journal of High Performance Computing Applications
Kineograph: taking the pulse of a fast-changing and connected world

Proceedings of the 7th ACM european conference on Computer Systems
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
PowerGraph: distributed graph-parallel computation on natural graphs

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Naiad: a timely dataflow system

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

From social networks to targeted advertising, big graphs capture the structure in data and are central to recent advances in machine learning and data mining. Unfortunately, directly applying existing data-parallel tools to graph computation tasks can be cumbersome and inefficient. The need for intuitive, scalable tools for graph computation has lead to the development of new graph-parallel systems (e.g., Pregel, PowerGraph) which are designed to efficiently execute graph algorithms. Unfortunately, these new graph-parallel systems do not address the challenges of graph construction and transformation which are often just as problematic as the subsequent computation. Furthermore, existing graph-parallel systems provide limited fault-tolerance and support for interactive data mining. We introduce GraphX, which combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework. We leverage new ideas in distributed graph representation to efficiently distribute graphs as tabular data-structures. Similarly, we leverage advances in data-flow systems to exploit in-memory computation and fault-tolerance. We provide powerful new operations to simplify graph construction and transformation. Using these primitives we implement the PowerGraph and Pregel abstractions in less than 20 lines of code. Finally, by exploiting the Scala foundation of Spark, we enable users to interactively load, transform, and compute on massive graphs.