The input/output complexity of sorting and related problems
Communications of the ACM
A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
A bridging model for parallel computation
Communications of the ACM
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
A survey of out-of-core algorithms in numerical linear algebra
External memory algorithms
I/O-efficient techniques for computing pagerank
Proceedings of the eleventh international conference on Information and knowledge management
Compact representations of separable graphs
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract)
WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Group formation in large social networks: membership, growth, and evolution
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
ACM SIGIR Forum
On compressing social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large graph processing in the cloud
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model
Theory of Computing Systems - Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian Scheideler
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
SSDAlloc: hybrid SSD/RAM memory management made easy
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Triangle listing in massive networks and its applications
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Kineograph: taking the pulse of a fast-changing and connected world
Proceedings of the 7th ACM european conference on Computer Systems
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Parallel and I/O efficient set covering algorithms
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Ligra: a lightweight graph processing framework for shared memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mizan: a system for dynamic load balancing in large-scale graph processing
Proceedings of the 8th ACM European Conference on Computer Systems
Trinity: a distributed graph engine on a memory cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scale-up graph processing: a storage-centric view
First International Workshop on Graph Data Management Experiences and Systems
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Restreaming graph partitioning: simple versatile algorithms for advanced balancing
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient data partitioning model for heterogeneous graphs in the cloud
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
External memory K-bisimulation reduction of big graphs
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Computing infrastructure for big data processing
Frontiers of Computer Science: Selected Publications from Chinese Universities
DrunkardMob: billions of random walks on just a PC
Proceedings of the 7th ACM conference on Recommender systems
Analysis of partitioning strategies for graph processing in bulk synchronous parallel models
Proceedings of the fifth international workshop on Cloud data management
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Scale-up vs scale-out for Hadoop: time to rethink?
Proceedings of the 4th annual Symposium on Cloud Computing
Giraphx: parallel yet serializable large-scale graph processing
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
BDMPI: conquering BigData with small clusters using MPI
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
The energy case for graph processing on hybrid CPU and GPU systems
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
The inclusion-exclusion rule and its application to the junction tree algorithm
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scaling queries over big RDF graphs with semantic hash partitioning
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Fast iterative graph computation with block updates
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.