Solving problems on concurrent processors
Solving problems on concurrent processors
Efficient parallel algorithms for computing all pair shortest paths in directed graphs
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Distributed memory matrix-vector multiplication and conjugate gradient algorithms
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A randomized parallel algorithm for single-source shortest paths
Journal of Algorithms
All-to-All Personalized Communication in Multidimensional Torus and Mesh Networks
IEEE Transactions on Parallel and Distributed Systems
A hypergraph-partitioning approach for coarse-grain decomposition
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A Parallelization of Dijkstra's Shortest Path Algorithm
MFCS '98 Proceedings of the 23rd International Symposium on Mathematical Foundations of Computer Science
Towards Domain-Independent Machine Intelligence
ICCS '93 Proceedings on Conceptual Graphs for Knowledge Representation
Fast discovery of connection subgraphs
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Designing a highly-scalable operating system: the Blue Gene/L story
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
BlueGene/L applications: Parallelism On a Massive Scale
International Journal of High Performance Computing Applications
The blue gene/L supercomputer: a hardware and software story
International Journal of Parallel Programming
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Evaluating use of data flow systems for large graph analysis
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast PGAS connected components algorithms
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An effective GPU implementation of breadth-first search
Proceedings of the 47th Design Automation Conference
Fast PGAS Implementation of Distributed Graph Algorithms
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Graph Exploration on Multicore Processors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A middleware for parallel processing of large graphs
Proceedings of the 8th International Workshop on Middleware for Grids, Clouds and e-Science
Graph partitioning strategies for efficient BFS in shared-nothing parallel systems
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Indoor positioning using nonparametric belief propagation based on spanning trees
EURASIP Journal on Wireless Communications and Networking - Special issue on signal processing-assisted protocols and algorithms for cooperating objects and wireless sensor networks
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Accelerating CUDA graph algorithms at maximum warp
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Active pebbles: parallel programming for data-driven applications
Proceedings of the international conference on Supercomputing
Crunching large graphs with commodity processors
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A scalable eigensolver for large scale-free graphs using 2D graph partitioning
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel breadth-first search on distributed memory systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
ParallelGDB: a parallel graph database based on cache specialization
Proceedings of the 15th Symposium on International Database Engineering & Applications
Massively parallel breadth first search using a tree-structured memory model
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
A GPU implementation of inclusion-based points-to analysis
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Green-Marl: a DSL for easy and efficient graph analysis
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Highly scalable graph search for the Graph500 benchmark
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Direction-optimizing breadth-first search
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On distributed file tree walk of parallel file systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approximate weighted matching on emerging manycore and multithreaded architectures
International Journal of High Performance Computing Applications
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient breadth-first search on large graphs with skewed degree distributions
Proceedings of the 16th International Conference on Extending Database Technology
Atomic-free irregular computations on GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Understanding parallelism in graph traversal on multi-core clusters
Computer Science - Research and Development
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
Efficient breadth first search on multi-GPU systems
Journal of Parallel and Distributed Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Direction-optimizing breadth-first search
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth- first search (BFS) scheme that scales for random graphs with up to three billion vertices and 30 billion edges. Scalability was tested on IBM BlueGene/L with 32,768 nodes at the Lawrence Livermore National Laboratory. Scalability was obtained through a series of optimizations, in particular, those that ensure scalable use of memory. We use 2D (edge) partitioning of the graph instead of conventional 1D (vertex) partitioning to reduce communication overhead. For Poisson random graphs, we show that the expected size of the messages is scalable for both 2D and 1D partitionings. Finally, we have developed efficient collective communication functions for the 3D torus architecture of BlueGene/L that also take advantage of the structure in the problem. The performance and characteristics of the algorithm are measured and reported.