A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

Authors:
Andy Yoo;Edmond Chow;Keith Henderson;William McLendon;Bruce Hendrickson;Umit Catalyurek
Affiliations:
Lawrence Livermore National Laboratory, Livermore;D. E. Shaw Research and Development, New York;Lawrence Livermore National Laboratory, Livermore;Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM;Ohio State University, Columbus
Venue:
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Year:
2005

Citing 9
Cited 42

Solving problems on concurrent processors

Solving problems on concurrent processors
Efficient parallel algorithms for computing all pair shortest paths in directed graphs

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Distributed memory matrix-vector multiplication and conjugate gradient algorithms

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A randomized parallel algorithm for single-source shortest paths

Journal of Algorithms
All-to-All Personalized Communication in Multidimensional Torus and Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
A hypergraph-partitioning approach for coarse-grain decomposition

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A Parallelization of Dijkstra's Shortest Path Algorithm

MFCS '98 Proceedings of the 23rd International Symposium on Mathematical Foundations of Computer Science
Towards Domain-Independent Machine Intelligence

ICCS '93 Proceedings on Conceptual Graphs for Knowledge Representation
Fast discovery of connection subgraphs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Designing a highly-scalable operating system: the Blue Gene/L story

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
BlueGene/L applications: Parallelism On a Massive Scale

International Journal of High Performance Computing Applications
The blue gene/L supercomputer: a hardware and software story

International Journal of Parallel Programming
Polymorphic On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Evaluating use of data flow systems for large graph analysis

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast PGAS connected components algorithms

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
Hogs and slackers: Using operations balance in a genetic algorithm to optimize sparse algebra computation on distributed architectures

Parallel Computing
Fast PGAS Implementation of Distributed Graph Algorithms

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Graph Exploration on Multicore Processors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A middleware for parallel processing of large graphs

Proceedings of the 8th International Workshop on Middleware for Grids, Clouds and e-Science
Graph partitioning strategies for efficient BFS in shared-nothing parallel systems

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Indoor positioning using nonparametric belief propagation based on spanning trees

EURASIP Journal on Wireless Communications and Networking - Special issue on signal processing-assisted protocols and algorithms for cooperating objects and wireless sensor networks
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Active pebbles: parallel programming for data-driven applications

Proceedings of the international conference on Supercomputing
Crunching large graphs with commodity processors

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A scalable eigensolver for large scale-free graphs using 2D graph partitioning

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel breadth-first search on distributed memory systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
ParallelGDB: a parallel graph database based on cache specialization

Proceedings of the 15th Symposium on International Database Engineering & Applications
Massively parallel breadth first search using a tree-structured memory model

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
A GPU implementation of inclusion-based points-to analysis

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Scalable GPU graph traversal

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Green-Marl: a DSL for easy and efficient graph analysis

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Highly scalable graph search for the Graph500 benchmark

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Direction-optimizing breadth-first search

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On distributed file tree walk of parallel file systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approximate weighted matching on emerging manycore and multithreaded architectures

International Journal of High Performance Computing Applications
Morph algorithms on GPUs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient breadth-first search on large graphs with skewed degree distributions

Proceedings of the 16th International Conference on Extending Database Technology
Atomic-free irregular computations on GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Understanding parallelism in graph traversal on multi-core clusters

Computer Science - Research and Development
Massive data analytics: the graph 500 on IBM Blue Gene/Q

IBM Journal of Research and Development
Efficient breadth first search on multi-GPU systems

Journal of Parallel and Distributed Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Direction-optimizing breadth-first search

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth- first search (BFS) scheme that scales for random graphs with up to three billion vertices and 30 billion edges. Scalability was tested on IBM BlueGene/L with 32,768 nodes at the Lawrence Livermore National Laboratory. Scalability was obtained through a series of optimizations, in particular, those that ensure scalable use of memory. We use 2D (edge) partitioning of the graph instead of conventional 1D (vertex) partitioning to reduce communication overhead. For Poisson random graphs, we show that the expected size of the messages is scalable for both 2D and 1D partitionings. Finally, we have developed efficient collective communication functions for the 3D torus architecture of BlueGene/L that also take advantage of the structure in the problem. The performance and characteristics of the algorithm are measured and reported.