ACM Computing Surveys (CSUR)
An improved parallel algorithm that computes the BFS numbering of a directed graph
Information Processing Letters
High-probability parallel transitive closure algorithms
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Sparse matrices in matlab: design and implementation
SIAM Journal on Matrix Analysis and Applications
Introduction to Algorithms
Cut Size Statistics of Graph Bisection Heuristics
SIAM Journal on Optimization
Reducing the bandwidth of sparse symmetric matrices
ACM '69 Proceedings of the 1969 24th national conference
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Lifting sequential graph algorithms for distributed-memory parallel computation
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A computational study of external-memory BFS algorithms
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
A Unified Framework for Numerical and Combinatorial Computing
Computing in Science and Engineering
Efficient Breadth-First Search on the Cell/BE Processor
IEEE Transactions on Parallel and Distributed Systems
Design and Engineering of External Memory Traversal Algorithms for General Graphs
Algorithmics of Large and Complex Networks
Early experiences with large-scale Cray XMT systems
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Large-scale parallel breadth-first search
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Accelerating large graph algorithms on the GPU using CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An effective GPU implementation of breadth-first search
Proceedings of the 47th Design Automation Conference
Fast PGAS Implementation of Distributed Graph Algorithms
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Graph Exploration on Multicore Processors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
The Combinatorial BLAS: design, implementation, and applications
International Journal of High Performance Computing Applications
Highly scalable graph search for the Graph500 benchmark
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Direction-optimizing breadth-first search
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On distributed file tree walk of parallel file systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Understanding parallelism in graph traversal on multi-core clusters
Computer Science - Research and Development
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
Efficient breadth first search on multi-GPU systems
Journal of Parallel and Distributed Computing
Direction-optimizing breadth-first search
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned parallel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix partitioning-based approach that mitigates parallel communication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny-Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.