GraphChi: large-scale graph computation on just a PC

Authors:
Aapo Kyrola;Guy Blelloch;Carlos Guestrin
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;University of Washington
Venue:
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Year:
2012

Citing 33
Cited 23

The input/output complexity of sorting and related problems

Communications of the ACM
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
A bridging model for parallel computation

Communications of the ACM
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
External-memory graph algorithms

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
A survey of out-of-core algorithms in numerical linear algebra

External memory algorithms
I/O-efficient techniques for computing pagerank

Proceedings of the eleventh international conference on Information and knowledge management
Compact representations of separable graphs

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract)

WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
External Memory Algorithms

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Group formation in large social networks: membership, growth, and evolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Large-Scale Parallel Collaborative Filtering for the Netflix Prize

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
A large time-aware web graph

ACM SIGIR Forum
On compressing social networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large graph processing in the cloud

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

Theory of Computing Systems - Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian Scheideler
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Piccolo: building fast, distributed programs with partitioned tables

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Counting triangles and the curse of the last reducer

Proceedings of the 20th international conference on World wide web
SSDAlloc: hybrid SSD/RAM memory management made easy

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Triangle listing in massive networks and its applications

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Kineograph: taking the pulse of a fast-changing and connected world

Proceedings of the 7th ACM european conference on Computer Systems
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
Parallel and I/O efficient set covering algorithms

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
PowerGraph: distributed graph-parallel computation on natural graphs

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

PowerGraph: distributed graph-parallel computation on natural graphs

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Ligra: a lightweight graph processing framework for shared memory

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mizan: a system for dynamic load balancing in large-scale graph processing

Proceedings of the 8th ACM European Conference on Computer Systems
Trinity: a distributed graph engine on a memory cloud

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scale-up graph processing: a storage-centric view

First International Workshop on Graph Data Management Experiences and Systems
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Restreaming graph partitioning: simple versatile algorithms for advanced balancing

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient data partitioning model for heterogeneous graphs in the cloud

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
External memory K-bisimulation reduction of big graphs

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Computing infrastructure for big data processing

Frontiers of Computer Science: Selected Publications from Chinese Universities
DrunkardMob: billions of random walks on just a PC

Proceedings of the 7th ACM conference on Recommender systems
Analysis of partitioning strategies for graph processing in bulk synchronous parallel models

Proceedings of the fifth international workshop on Cloud data management
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Scale-up vs scale-out for Hadoop: time to rethink?

Proceedings of the 4th annual Symposium on Cloud Computing
Giraphx: parallel yet serializable large-scale graph processing

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
BDMPI: conquering BigData with small clusters using MPI

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
The energy case for graph processing on hybrid CPU and GPU systems

IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
The inclusion-exclusion rule and its application to the junction tree algorithm

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scaling queries over big RDF graphs with semantic hash partitioning

Proceedings of the VLDB Endowment
Horton+: a distributed system for processing declarative reachability queries over partitioned graphs

Proceedings of the VLDB Endowment
Fast iterative graph computation with block updates

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.