Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

Authors:
David A. Bader;Kamesh Madduri
Affiliations:
Georgia Institute of Technology, USA;Georgia Institute of Technology, USA
Venue:
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Year:
2006

Citing 0
Cited 35

Evaluating the potential of multithreaded platforms for irregular scientific computations

Proceedings of the 4th international conference on Computing frontiers
A graph-theoretic analysis of the human protein-interaction network using multicore parallel algorithms

Parallel Computing
Locality behavior of parallel and sequential algorithms for irregular graph problems

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Accelerating large graph algorithms on the GPU using CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
Petascale computing for large-scale graph problems

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
Ordered and unordered algorithms for parallel breadth first search

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Scalable Graph Exploration on Multicore Processors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Indoor positioning using nonparametric belief propagation based on spanning trees

EURASIP Journal on Wireless Communications and Networking - Special issue on signal processing-assisted protocols and algorithms for cooperating objects and wireless sensor networks
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Analysis and performance results of computing betweenness centrality on IBM Cyclops64

The Journal of Supercomputing
Active pebbles: parallel programming for data-driven applications

Proceedings of the international conference on Supercomputing
Crunching large graphs with commodity processors

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Parallel breadth-first search on distributed memory systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The Combinatorial BLAS: design, implementation, and applications

International Journal of High Performance Computing Applications
A GPU implementation of inclusion-based points-to analysis

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Scalable GPU graph traversal

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Green-Marl: a DSL for easy and efficient graph analysis

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Highly scalable graph search for the Graph500 benchmark

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A Bandwidth-Optimized Multi-core Architecture for Irregular Applications

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Direction-optimizing breadth-first search

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approximate weighted matching on emerging manycore and multithreaded architectures

International Journal of High Performance Computing Applications
Prototyping hardware support for irregular applications

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Morph algorithms on GPUs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Atomic-free irregular computations on GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Understanding parallelism in graph traversal on multi-core clusters

Computer Science - Research and Development
Massive data analytics: the graph 500 on IBM Blue Gene/Q

IBM Journal of Research and Development
LVars: lattice-based data structures for deterministic parallelism

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Efficient breadth first search on multi-GPU systems

Journal of Parallel and Distributed Computing
Freeze after writing: quasi-deterministic parallel programming with LVars

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Direction-optimizing breadth-first search

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph abstractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence in recent years for applications involving large-scale networks. In this paper, we present fast parallel implementations of three fundamental graph theory problems, Breadth-First Search, st-connectivity and shortest paths for unweighted graphs, on multithreaded architectures such as the Cray MTA-2. The architectural features of the MTA-2 aid the design of simple, scalable and high-performance graph algorithms. We test our implementations on large scale-free and sparse random graph instances, and report impressive results, both for algorithm execution time and parallel performance. For instance, Breadth-First Search on a scale-free graph of 400 million vertices and 2 billion edges takes less than 5 seconds on a 40-processor MTA-2 system with an absolute speedup of close to 30. This is a significant result in parallel computing, as prior implementations of parallel graph algorithms report very limited or no speedup on irregular and sparse graphs, when compared to the best sequential implementation.