A bridging model for parallel computation
Communications of the ACM
Domain decomposition on parallel computers
IMPACT of Computing in Science and Engineering
A critique of ANSI SQL isolation levels
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
PMRSB: parallel multilevel recursive spectral bisection
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Matrix computations (3rd ed.)
Applied numerical linear algebra
Applied numerical linear algebra
Multilevel hypergraph partitioning: application in VLSI domain
DAC '97 Proceedings of the 34th annual Design Automation Conference
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Data Integration using Self-Maintainable Views
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Optimizing Graph Algorithms for Improved Cache Performance
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Scaling personalized web search
WWW '03 Proceedings of the 12th international conference on World Wide Web
Algorithm Design
Prioritized Multiplicative Schwarz Procedures for Solving Linear Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
When cache blocking of sparse matrix vector multiply works and why
Applicable Algebra in Engineering, Communication and Computing
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Asynchronous Algorithms in MapReduce
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Signal/collect: graph algorithms for the (semantic) web
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Overlapping clusters for distributed computation
Proceedings of the fifth ACM international conference on Web search and data mining
Discrete-continuous optimization for large-scale structure from motion
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Streaming graph partitioning for large distributed graphs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Two-scale Methods for Eikonal Equations
SIAM Journal on Scientific Computing
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
GraphChi: large-scale graph computation on just a PC
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Hi-index | 0.00 |
Scaling iterative graph processing applications to large graphs is an important problem. Performance is critical, as data scientists need to execute graph programs many times with varying parameters. The need for a high-level, high-performance programming model has inspired much research on graph programming frameworks. In this paper, we show that the important class of computationally light graph applications - applications that perform little computation per vertex - has severe scalability problems across multiple cores as these applications hit an early "memory wall" that limits their speedup. We propose a novel block-oriented computation model, in which computation is iterated locally over blocks of highly connected nodes, significantly improving the amount of computation per cache miss. Following this model, we describe the design and implementation of a block-aware graph processing runtime that keeps the familiar vertex-centric programming paradigm while reaping the benefits of block-oriented execution. Our experiments show that block-oriented execution significantly improves the performance of our framework for several graph applications.